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Professor  Karl  Pearson  wrote  the  following  statement,  "The  field  of 
science  is  unlimited;  its  material  is  endless,  every  group  of  natural 
phenomena,  every  phase  of  social  life,  every  stage  of  past  or  present 
development  is  material  for  science."  If  any  one  field  of  science 

exemplifies  these  remarks,  it  is  the  field  of  statlstics.'^The  papers 

/ 

in  these  Proceedings  indicate  a few  areas  where  statistics  and  the 
design  of  experiments  are  helping  the  Army  solve  some  of  its  many 
problems.  Weapon  system  analysis  is  just  one  of  those  fields  where 
statistics  plays  an  important  role.  ^£o  J>rlng  this  out,  we  quote  a 
paragraph  by  Dr.  Frank  E.  Grubbs  which  appears  in  the  Engineering 
Design  Handbook:  DARCOM-P  706-101.  "Chapter  21  brings  us  to  the 
increasingly  important  topics  of  reliability,  life  testing,  availability 
and  maintainability  of  systems,  and  reliability  growth.  There  is  hardly 
any  weapon  system  today  which  can  or  should  escape  analyses  in  terms  of 
these  fields  of  interest,  and  the  analyst  must  be  highly  competent  in 
evaluations  associate!  with  life-time  or  failure  distributions  such  as 
the  exponential,  the  Welbull,  the  lognormal,  and  the  binominal  probability 
distributions.  Statistical  testing  for  high  reliability  and  safety  of 
systems  is  introduced  in  Chapter  21,  as  well  as  a brief  account  of 
reliability  growth.  A major  topic,  and  current  effort,  concerning  systems 
today  is  that  of  being  able  to  place  confidence  bounds  on  the  true, 
unknown  reliability  of  complex  systems;  accordingly,  coverage  of  the  more 
recent  and  accurate  techniques  is  given  for  the  practicing  analyst.  Finally, 
reliability  now  is  often  one  of  the  major  or  sole,  characteristics  of  some 
weapon  systems,  and  hence  may  represent  a prime  activity  for  the  systems 
analyst  in  many  applications  of  his  knowledge." 
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Except  for  the  Nineteenth  Conference  on  the  Design  of  Experiments  In 
Army  Research,  Development  and  Testing,  which  was  conducted  at  Rock. 

Island  Arsenal,  Rock  Island,  Illinois,  the  first  twenty-two  meetings 
of  this  series  of  symposia  were  held  on  the  east  coast.  The  concentration 
of  Army  installations  in  this  area  played  a key  role  In  selecting  the 
hosts  for  these  meetings.  The  Army  Mathematics  Steering  Committee  (AMSC) 
sponsors  these  conferences  on  behalf  of  the  Office  of  the  Chief  of  Research, 
Development  and  Acquisition.  Members  of  the  subcommittee  on  Probability 
and  Statistics,  whose  responsibility  it  is  to  organize  the  Design 
Conferences  for  the  AMSC,  had  some  misgivings  about  holding  the  twenty-third 
meeting  on  the  west  coast.  But  these  doubts  were  dispelled  by  the  facts 
that  the  number  of  attendees  as  well  as  the  number  of  contributed  papers 
matched  those  of  the  east  coast  meetings.  One  anomaly  did  occur.  Instead 
of  having  one  fourth  of  the  contributed  papers  classified  as  clinical, 
in  the  California  meeting  nearly  one  half  were  in  this  category. 

The  host  f >r  the  twenty- third  Design  of  Experiments  Conference  was  the 
U.  S.  Army  Combat  Development  Experimentation  Command,  Fort  Ord,  California. 
Excellent  facilities  for  holding  this  meeting  on  19-21  October  1977  were 
provided  by  the  Naval  Postgraduate  School.  Dr.  Marlon  R.  Bryson,  acting 
for  the  host  for  the  conference,  served  as  Chairman  on  Local  Arrangements. 

He  was  assisted  in  this  task  by  Mr.  John  E.  Banks  and  several  other  members 
of  his  staff.  Those  in  attendance  are  grateful  to  them  for  so  ably  carrying 
out  the  many  tasks  that  needed  to  be  handled  before  and  during  the  course 
of  a meeting  of  this  size. 
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The  five  nationally  known  invited  speakers  together  with  the  titles 
of  their  addresses  are  listed  below.  These  gentlemen  gave  those  in 
attendance  an  opportunity  to  hear  about  recent  developments  in  the 
field  of  statistics. 

Area  of  Talk 

Analysis  of  Unbalanced 
Experiments 

Censored  Data 

The  Jackknife:  Survey 
and  Applications 

Estimation  of  Complex  System 
Availability 

Time  Series  Modelling 

Dr.  Churchill  Eisenhart  was  recipient  this  year  of  the  Samuel  S.  Wilks 
Memorial  Medal.  He  richly  deserves  this  honor  for  his  scientific 
contributions.  He  has  played  many  important  roles  in  the  conducting 
of  these  conferences.  At  this  meeting  there  were  forty-two  contributed 
papers.  Twenty-two  of  these  were  classified  as  technical  and  the  rest 
were  presented  in  clinical  sessions.  Ninety-six  persons  registered  for 
the  conference,  but  there  were  one  hundred  and  eighteen  individuals 
who  attended  the  opening  session. 


Speaker  and  Institution 

Prof.  H.  0.  Hartley 
Texas  A&M  University 

Prof.  Norman  Breslow 
University  of  Washington 

Prof.  Rupert  Miller 
Stanford  University 

Prof.  Donald  P.  Gaver 
Naval  Postgraduate  School 

Prof.  G.  E.  P.  Box 
University  of  Wisconsin 


The  members  of  the  AMSC  are  duly  aware  of  all  the  effort  that  goes  into 
making  these  conferences  such  memorable  events.  Their  thanks  go  to 
all  these  in  attendance.  The  speakers  in  particular  need  recognition 
for  all  the  time  they  spent  in  preparing  and  delivering  their  interesting 

v 


\ 

f 

I 

\ 

j 

k 

i 

4 


! 


* 

j 

i 

i 


4 

i 


purr**-*-’*4 ' 


papers.  Dr.  Frank  E.  Grubbs  and  Professor  Herbert  Solomon,  who 
respectively  served  as  Program  Chairman  and  Chairman  of  the  conference, 
are  to  be  congratulated  for  guiding  to  conclusion  another  successful 
scientific  meeting. 
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ANALYSIS  OF  UNBALANCED  EXPERIMENTS 


A 


H.  0.  Hartley 
Texas  ASM  University 
College  Station,  Texas  77843 


1.  Introduction 

The  title  of  this  talk  is  rather  general*  and  I should  explain, 
therefore,  that  it  is  really  confined  to  a limited  number  of  aspects 
of  the  area  covered  by  the  title,  I am  restricting  myself  to  ao-'callad 
"multiple  factor"  experiments,  that  is  experiments  in  which  "responses" 
are  measured  under  experimental  "conditions"  described  by  specifying 
the  "levels"  for  each  of  several  "factors."  The  well-known  "factorial 
experiments"  represent  a special  case  of  a balanced  and  multiple  factor 
experiment  in  which  precisely  one  (or  precisely  an  equal  number  of) 
experimental  unit(s)  is  (are)  used  at  all  possible  combinations  of  factor 
levels.  An  unbalanced  experiment  will  have  unequal  numbers  of  units 
(Including  zero  units)  exposed  at  the  possible  factor-level  combinations. 

There  are  two  main  causes  of  unbalance: 

(i)  Experiments  ' originally  designed  as  balanced  experiments  have 
become  unbalanced  through  "accidents. " The  best  known 
examples  are  the  so-called  "missing  value"  or  "missing  plot" 
situations  in  which  the  response  for  a number  of  units  antared 
into  the  experiment  has  been  lost  or  has  been  rejected  as  an 
"outlier"  generated  by  extraneous  error-sources.  Other 
"accidents"  lead  to  the  "censorship"  or  "grouping"  of  soma  or 
all  of  the  responses.  ThiB  means  that  these  reBponaes  are 
not  known  "exactly"  but  are  known  to  lie  within  certain  ranges 
of  the  response  and  measurement  scale.  For  other  situations 
of  unbalance  described  as  "Incomplete  data"  see  e.g. , Hartley 
and  Hocking  (1971) . 

(11)  The  unbalanced  data  have  not  arisen  from  a designed  experiment 
but  are  the  results  of  an  operational  study  involving  multiple 
classifications  of  sampled  units  by  numerous  factors  invariably 
leading  to  unequal  representations  of  the  "calls"  (factor-level 
combinations)  and  usually  involving  many  zero-calls. 

Finally,  our  concept  of  "analysis"  is  here  confined  to  the  problem 
of  estimating  the  parameters  in  a linearly  additive  model  postulated  for 
the  data.  More  specifically,  we  shall  be  concerned  with  the  so-called 
mixed  analysis  of  variance  model.  Briefly  in  this  model,  the  observed 
response  is  the  sum  total  of  a mean  response  plus  additive  effects  con- 
tributed by  "effect  constants"  of  the  applied  levels  of  the  "fixed  factore" 
plus  the  random  "effect  variables"  of  the  applied  levels  of  the  "random 
factors."  This  model  is  illustrated  by  the  examples  of  Section  2 and 
mathematically  defined  in  the  Appendix. 
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In  Uniting  our  present  objectives  to  the  estimation  of  parameters, 
we  omit  the  important  aspect  of  the  drawing  of  inferences  from  the  data. 
However,  we  do  not  omit  to  stress  that  in  the  case  of  (il)  when  unbalanced 
operational  data  are  analyzed  the  drawing  of  inferences  of  a causative 
nature  is  particularly  hazardous  and  requires  the  examination  of  potential 
"latent  variables"  (see  e.g..  Box  (1966),  Hartley  (1967))  causing  spurious 
input-response  relationships. 

2.  Illustrative  Examples  of  Unbalanced  Data 

Before  turning  to  the  mathematical  details  of  the  estimation  theory, 
it  may  be  helpful  to  illustrate  the  concepts  of  Section  1 by  examples. 
These  examples  illustrate  the  various  sources  of  unbalance.  At  the  same 
time  they  recapitulate  the  well-known  concepts  of  "fixed  factors"  and 
"random  factors" in  analysis  of  variance. 

Example  2,1.  (0.  L.  Davies  (1956)  pp,  296-297). 

We  quote  from  Davies. 

The  following  is  an  example  of  an  experimental  design  of 
general  utility  in  many  fields.  It  relates  to  the  testing  of 
nine  aluminum  alloys  for  their  resistance  to  corrosion  in  a 
chemical  plant  atmosphere.  Four  Bites  in  the  factory  were 
chosen,  and  on  each  of  them  a plate  made  from  each  alloy  was 
exposed  for  a year.  The  plates  were  then  submitted  to  four 
observers,  who  assessed  their  condition  visually  and  awarded 
marks  to  each  from  0 to  10  according  to  the  degree  of  resist- 
ance to  attack.  The  observers  worked  Independently  and  the 
plates  were  submitted  to  them  in  random  order;  in  other  words 
the  observers  did  not  assess  all  plates  from  one  site  at  the 
same  time.  ...  The  aim  of  the  experiment  was  to  decide  which, 
if  any,  of  the  alloys  were  suitable  for  use  in  the  factory, 
and  especially  to  select  any  found  to  be  suitable  on  all  the 
sites.  It  was  also  required  to  know  whether  the  four  observers 
agreed  in  their  relative  assessments. 

Basically  the  experiment  is  a balanced  9x4x4  factorial  in  which 
plates  from  9 aluminum  alloys  are  exposed  at  each  of  four  different 
plant  sites  and  these  are  inspected  by  each  of  four  observers.  The 
mixed  ANOVA  model  (not  spelled  out  by  Davies)  that  appears  to  underlie 
his  analysis  1b  as  follows: 
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where 


yiog  - score  of  ith  alloy  on  sth  site  tested  by  oth  observer, 
p • mean  score 
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ot^  ■*  differential  effect  constant  of  ith  alloy  (fixed  factor) 

bQ  ■ effect  variable  of  oth  observer  (random  factor) 

cg  ■ effect  variable  of  sth  site  (random  factor) 

uig  • interaction  variable  of  ith  alloy  by  sth  site  (random  factor) 

eios  " error‘ 

Note  that  the  sites  are  considered  random  variables  since  inferences 
are  desired  for  the  plant  as  a whole  and  not  just  for  the  experimental 
sites.  It  seems  reasonable  that  a random  interaction  variable  between 
sites  and  alloys  is  provided  (which  is  rightly  used  as  the  valid  error 
for  comparing  alloys)  but  that  interactions  between  observers  and  alloys 
or  observers  and  sites  are  regarded  negligible-  The  above  experiment  1b, 
of  course,  balanced  and  the  standard  analysis  consequential  to  the  above 
model  is  given  by  Davies.  In  realistic  situations  unbalance  may  easily 
arise  through  "accidents11  such  as  certain  scores  getting  loBt  or  becoming 
invalid.  We  should,  however,  point  out  that  the  so-called  "missing  value 
analysis"  is  strictly  speaking  correct  only  if  all  factors  are  fixed. 
However,  the  data  may  be  analyzed  by  the  method  given  in  Appendix  1. 

Example  2.2.  (0.  H,  Pfeiffer  (1964)). 

This  is  an  experiment  to  evaluate  the  performance  of  swivel  hook- 
type  cross  chain  fasteners  of  tire  chains.  Again  the  experimental  design 
was  balanced  as  described  by  Pfeiffer.  Briefly,  the  test  comprised  8 
"wheel-blocks"  in  the  form  of  the  8 tires  of  the  4 rear  dual  wheels  and 
these  "blocks"  were  regarded  as  a factorial  arrangement  of  three  2-level 
factors,  viz.  "front  duals"  versus  "rear  dualB,"  "right  duals"  versus 
"left  duals",  and  outside  wheelB  versus  Inside  wheels.  Within  each 
"block"  the  3 "treatments"  consisted  of  3 "clusters"  of  three  different 
types  of  hook  fasteners,  each  cluster  comprising  4 individual  fasteners. 

The  main  response  measured  for  each  fastener  was  the  log  of  its  miles  to 
failure. 

Turning  then  to  the  factors,  the  type  of  fastener  is  clearly  a fixed 
treatment  factor  and  the  individual  fasteners  a random  repetition  factor 
from  the  population  of  fasteners  of  each  type  but  teBted  within  a "cluster" 
on  the  wheel.  The  tlreB  are  also  a random  factor  since  inferences  must 
not  be  restricted  to  the  particular  Bet  of  8 tires  used  in  the  test  but 
they  have  positional  "main  treatments"  superimposed  in  the  form  of  the 
above  23  factorial.  Pfeiffer  uses  (we  think  conservatively)  the  tire  * 
type  interaction  aa  an  error  which,  of  course,  also  includes  sny  position  x 
type  interaction.  This  decision  is  proved  correct  since  the  tire  x type 
mean  square  ie  virtually  identical  with  the  within  type  mean  square. 

In  this  experiment  unbalance  Arose  through  accidental  censorship: 
Certain  fasteners  had  not  failed  when  the  experiment  was  terminated  at 
425  miles.  Since  the  missing  values  are  all  known  to  exceed  log  425,  the 
customary  missing  value  analysis  (which  assumes  that  the  missing  values 
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are  a random  selection  from  the  experimental  responses)  is  not  appropriate. 
Likewise  the  analysis  of  the  observed  miles  to  failure  as  an  unbalanced 
experiment  is  not  appropriate  as  it  would  disregard  the  censored  information. 
An  appropriate  analysis  would  be  an  iterative  EM  algorithm  consisting  of 
the  following  steps. 

STEP  (E):  For  each  missing,  value  compute  its  conditional  expectation,  E, 
given  that  it  exceeds  the  Value  £ ■ log  425.  This  is  given  by 


E ■ p + o 


Z(^, 
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where 


iterative  estimate  of  the  cell  mean  for  the  missing  value 
computed  from  the  current  estimates  of  the  linear  ANOVA 
modal,  ' v 

iterative  estimate  of  the  within  cell  varianqe, 

are  respectively  the  standard  normal  ordinate^  and  tall  area, 
of  an  approximate  normal  within  cell  dist*ibut^on  of  log 
miles  to  failure  requires  checking.  \ 

\ 

STEP  (M) s Using  all  values  of  E computed  by  [2]  along  with  the  observed 
log  miles  to  failure  records,  compute  the  customary  balanced  ANOVA  estimates 
of  all  terms  in  the  additive  ANOVA  model  and  return  to  STEP  (E) . 

The  symbol  (M)  of  the  second  step  stands  for  Maximum  Likelihood  estima- 
tion and  the  term  EM  algorithm  was  introduced  by  Dempster,  Rubin  and  Laird 
(1977).  Earlier  accounts  of  the  algorithm  are  given  by  Hartley  (1958)  and 
Hartley  and  Hocking  (1971). 

i: 

Example  2.3.  (R.  Bell  (1963)  p.  623). 


and  Z(  ),  Q(  ) 
The  assumption 


"This  paper  presents  a typical  analysis  of  service  practice 
firing  results  and  indicates  the  significance  of  these  results 
in  the  Surveillance  Program.  An  example  of  the  evaluation  of  the 
annual  service  practice  firings  for  the  Honest  John  Rocket  is 
presented. 

"934  Firings  of  Rocket  762MM:  M31  Series,  conducted  for  troop 

training  and  other  purposes  by  both  United  States  and  NATO  firing 
units  have  been  considered.  The  purpose  of  this  study  was  to 
investigate  the  overall  accuracy  performance  of  the  M31  rocket 
system  when  fired  by  troop  units  and  to  establish  if  there  is  any 
indication  of  a deterioration  of  this  accuracy  performance  with 
increasing  Age  of  the  M6  series  rocket  motors  of  these  M31  series 
rockets." 
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More  specifically  the  operational  data  bank  used  in  the  study 
consisted  of  all  firings  during  6 years  (1958-63)  by  3 launchers 

(289,  386,  33)  using  rocket  motors  of  varying  ages  (1“,  1 to  2~ 

7 to  8“) . The  6 * 3 x 8 factorial  table  was  by  necessity  unbalanced 
with  many  "zero  cells."  Among  other  sources  of  unbalance  there  was  a 
tendency  of  the  older  rocket  motors  to  be  more  heavily  represented  in 
the  later  years.  The  data  were  acquired  operationally  over  the  years 
and  the  analysis  here  carried  out  was  not  originally  planned. 

Of  the  three  factors  both  the  3 launchers  and  the  8 ages  are  fixed 
but  there  may  be  some  question  as  to  how  the  6 years  should  be  treated. 
Inferences  are  obviously  required  for  the  period  subsequent  to  that 
covered  by  the  data  bank  and  there  may  be  doubt  as  to  whether  conditions 
in  1958  to  1963  should  be  regarded  as  a random  sample  of  those  prevailing 
in  future  years.  However,  if  such  a proposition  is  accepted,  the  analysis 
of  Appendix  1 could  he  applied  to  obtain  estimates  of  the  age  and  launcher 
contrasts  and  their  interaction  as  well  as  estimates  of  components  of 
variance  attributable  to  year  to  year  variation,  the  year  x age,  and  the 
year  x launcher  interactions, 

If  there  is  some  doubt  about  the  representativeness  of  years  1958 
to  1963  of  future  conditions,  no  useful  inferences  can  be  made  unless 
a time  series  model  can.be  formulated. 

3.  Relation  Between  Various  Methods  in  Balanced  and  Unbalanced  Data 
Analysis 

As  is  well  known  the  analysis  of  variance  of  balanced  factorial  data 
makes  a distinction  between  the  so-called  "fixed  factors"  and  "random 
factors."  These  concepts  were  introduced  in  Section  1 and  illustrated  in 
Section  2 by  three  examples.  The  same  distinction  muBt  be  made  when 
analyzing  unbalanced  data.  In  the  two-way  table  below  we  distinguish  two 
main  types  of  ANOVA's,  namely  (i)  an  analysis  in  which  all  factors  (except 
the  error)  are  fixed  which  is  contrasted  with  (ii)  the  so-called  mixed 
ANOVA,  a situation  where  some  factors  are  fixed  but  others  are  random. 

The  so-called  all  random  model  is  included  in  this  case  as  one  in  which 
the  only  fixed  parameter  Is  the  mean  response.  Of  course  (1)  is  also  a 
special  case  of  (ii),  namely  the  case  in  which  thu  only  random  factor  is 
the  error. 

The  row  headings  in  the  table  are  (a)  balanced  data  and  (c)  unbalanced 
data,  but  an  intermediate  situation  (b)  is  provided  in  which  the  data  are 
"almost  balanced"  (notably  missing  value  situations).  In  the  body  of  the 
table  we  give  very  brief  descriptions  of  the  appropriate  analysis  but  would 
amplify  these  as  follows: 
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TABLE  1 

£ 

e- 
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Relation 

Between  Various  Methods  in  Balanced 

and  Unbalanced  Data  Analysis 

f. 

(i) 

All  Factors  Fixed 

(ii) 

Some  Factors  Fixed 

Some  Random 

(a)  Balanced 

Data 

ANOVA  or  regression 
analysis  on  dummy 
variables 

ANOVA  and  estimation 
of  components  of 
variance 

(b)  Almost 

Balanced 

Data 

Miaaing  value  formulas 

ANOVA  - ML  EST's, 
tests  approximate 

Missing  value  formulas, 
heuristic  ANOVA 
approximate 

(c)  Unbalanced 
Data 

' 

I 


Regression  analysis  on 
durny  variables. 

Exact  Max.  Likelihood 
estimation  and  hypothesis 
teats 


Mixed  model  ANOVA  > 

components  of  variance  . 

estimation,  j 

Estimation  of  constants, 

Max.  Likelihood 

Minque  i 

Present  Method  j j 


: 

r 


i 

S 
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(i)(a)  If  the  random  (equal  variance)  error  is  the  only  random  factor, 
the  data  are  of  the  form  of  a linear  model  y « XB  + e with  the 
design  matrix  X consisting  of  0,  1 "dummy  variables."  After 
reparameterization  of  3 (to  make  X non-singular)  the  regression 
analysis  is  identical  with  the  balanced  data  ANOVA  provided  we 
adopt  the  accepted  hierarchy  of  factors  main  effects  followed 
by  two  factor  interactions,  etc. 

(i) (c)  The  same  applies  to  the  case  of  all  factors  fixed  unbalanced 

data  banks  except  that  the  reparameterization  is  more  dependent 
on  the  adopted  hierarchy  in  which  the  factors  are  ordered. 

(i)  (b)  This  case  is  separated  from  (i)(c)  in  that  it  is  often  a 

computational  advantage  to  reduce  the  case  of  almost  balanced 
data  to  that  of  balanced  data  by  a missing  value  EM  type 
algorithm. 

(ii)  (a)  The  simultaneous  estimation  of  effect-constants  and  components 

of  variance  in  a balanced  ANOVA  is  well  documented  in  the 
statistical  literature.  The  (unbiased)  estimation  procedure 
may,  however,  lead  to  negative  estimates  of  variance  components 
for  which  various  remedies  are  advocated. 

(ii) (b)  It  should  be  stressed  that  the  customary  missing  value  estimates 

are  M.L.  estimates  only  for  the  all  fixed  factor  models.  Therefore 
an  accurate  treatment  must  reduce  thia  case  to  (il)(c). 

(ii) (c)  This  is  the  most  general  situation  and  a computationally  convenient 
method  is  described  in  Appendix  1 which  follows.  Note  that  all  six 
situations  (i)(a),  (b) , (c)|  (ii) (a) , (b),  (c)  could  be  regarded  as 
special  cases  of  (ii)(c). 

Before  turning  to  a more  detailed  discussion  of  (ii)(c)  in  Appendix  1, 
we  should  stress  that  it  does  not  cover  unbalance  through  censorship  and  an 
E-algcrithm  should  be  adjoined  to  the  M.L.  estimation  treatment  briefly 
referred  to  in  Appendix  1. 
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APPENDIX  1+ 

. A SIMPLE  ' SYNTHESIS 5 -BASED  METHOD  OF  VARIANCE  COMPONENT  ESTIMATION 

by 

* 4.  1 

H.  0.  Hartley  , J.  N.  K.  Rao  and  Lynn  LaMotts 

1.  Introduction 

Two  of  us  (HOH  and  JNKR)  have  recently  had  occasion  (see  Hartley  and 
Rao  (1977))  to  consider  components  of  variance  estimation  techniques  In 
data  banks  arising  from  sample  surveys.  Such  data  banks  differ  from  those 
encountered  in  experimental  designs  in  that  the  "number  of  observations", 
n (in  our  case  the  number  of  elementary  sampling  units)  is  exceedingly  large. 
We  have  therefore  been  prompted  to  search  for  computationally  efficient 
methods  for  the  estimation  of  components  of  variance  when  n la  large  end 
the  algorithm  here  described  involves  a computational  effort  (as  measured 
by  the  number  of  products)  which  is  a linear  function  of  n and  this  is 
generally  regarded  as  computet ional ly  highly  efficient.  While  our 
algorithm  is  new  the  statistical  method  of  estimation  we  employ  is  not. 

In  fact  it  represents  a special  case  of  G.  R.  Rao'e  (1971)  MINQUE  (with 
V «*  I).  It  is  also  identical  (Communication  by  S.  R.  Searle)  with  a special 


* 

H.  0.  Hartley,  Institute  of  Statistics,  Texas  A&M  University 

I + 

5 J.  N.  K.  Rao,  Garleton  University,  Ottsws 

1 ^Lynn  I<aMottt,  Quantitative  Management  Science,  University  of  Houston 

| *A  shortened  version  of  Appendices  1 and  7.  will  be  published  in  Biometrics. 
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case  of  the  first  Iterate  solution  of  the  REML  equations  of  Corbell  and 
Searle  (1976)  whose  algorithms  appear  to  Involve  much  larger  computational 
efforts  (proportional  to  n2).  The  computational  effort  Is  also  consider** 
ably  less  than  that  Involved  In  the  M.L.  estimation  by  Hartley  and  Rao 
(1967)  which  is  still  fairly  laborious  Inspite  of  the  Improvements 
through  the  W-transformation  by  Hemmerle  and  Hartley  (1973). 

Inspite  of  its  computational  simplicity  the  estimation  procedure  has 
numerous  "optimality  properties".  Apart  from  being  a special  case  of 
M1HQUE  other  properties  are  established  In  Section  6 and  the  asymptotic 
consistency  is  proved  in  the  Appendix  under  fairly  general  conditions. 

The  consistency  of  our  estimator  makes  it  convenient  as  a starting  point 
for  a single  M.L.  cycle  to  obtain  asymptotically  fully  efficient  estimates. 

Finally  we  establish  simple  conditions  for  the  astimabillty  of  all 
variance  components  by  our  mathod  (see  Soctlon  6).  In  this  context  we 
observe  that  with  other  methods  (such  as  the  Henderson  3 mathod  (Henderson 

i 

(1953))  or  the  Abbreviated  Doolittle  and  square  root  method  (sas  e.g. 
Gaylor,  Lucas  and  Anderson  (1970))  estimablllty  depends  on  the  subjective 
ordering  of  the  components  (such  as  with  the  Forward  Doolittle  procedure) 
and  if  the  ordering  is  unfortunate  the  method  may  fall  to  yield  estimates 
for  certain  components  while  with  a different  ordering  (not  attempted)  all 


components  may  well  be  estimable. 


H 


l 


j 

| 

2.  The  Mixed  ANOVA  Model 

Employing  the  currently  used  notation  we  write  the  mixed  ANOVA  model 
in  the  form 

c+1 

y • Xe  + E U.bj  (1) 

i-1  1 1 

where 

y ia  an  n x 1 vector  of  observations, 

X It  an  n x k matrix  of  known  coefficients, 

o is  a k x 1 vector  of  unknown  constants, 

* 

is  an  n x matrix  of  0,  1 coefficients, 

b^  is  an  m^  x 1 vector  of  normal  variables  from  N(0,  o2^). 

Specifically  Uc+1  - and  bc+1  is  an  n-vactor  of  "error  variablaa". 

Moreover  the  design  matrices  have  precisely  one  value  of  1 in  each  of 

tholr  rows  and  all  other  coefficients  0.  We  denote  by  m ■ £ m.  the 

i-1  1 

total  number  of  random  levels. 

We  may  assume  without  loss  of  generality  that 

X’X  » I (2) 

for  if  (2)  is  not  satisfied  we  may  orthogonalize  X by  a Gram  Schmidt 
orthogonalixatlon  process  with  a consequential  reparameterization  of  a 
omitting  any  linearly  dependent  columns  in  the  Gram  Schmidt  process. 

Usually  the  first  column  of  X is  the  column  vector  with  all  elements  - 
l//n.  Zt  is  ths  objective  of  the  method  to  compute  estimates  of  the 
variance  components  o2^  and  the  vector  o. 

3.  The  Present  Method 
The  essence  of  the  present  method  is  to 
(*)  Select  c+1  quadratic  forms  Qj(y)  in  ths  elements  of  y. 
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(b)  Use  the  method  of  synthesis  (Hartley  (1967),  Rao  (1968))  to 


or 


obtain  the  coefficients  in  the  formulas  for  E(Qj)  In  the 
form 


E<V 


c+1 

s ku°V 

i-1  JX  x 


(3) 


(c)  Estimate  o2^  by  equating  the  computed  to  their  expectations 

A 

i.e.  by  inverting  the  system  (3)  to  compute  the  vector  o2  with 
elements  a2. 


a2  - K“Q(y) 
* -u 


(4) 


from  the  vector  Q(y)  with  elements  Q, (y)  where  K ■ (k..)  with 

j j* 

rank  to  be  discussed  in  Section  6 and  7. 

A 

(d)  Replacing  any  negative  elements  of  o2  by  0,  with  consequences  to 
be  discussed  in  Section  7. 

We  now  give  moro  details  for  (a),  (b)  ond  (c) 

(a)  The  (y)  will  be  based  on  contrasts  which  do  not  depend  on 

any  elements  of  a.  Accordingly  we  orthogonelise  all  matrices 
on  X and  construct  matrices  orthogonal  on  X as  follows:  De- 
note by  u(t,  i)  the  tth  column  vector  of  and  by  x(r)  the  rth 
column  vector  of  X then  the  columns  v(t,  i)  of  are  given  by 


v(t,i)  ■ u(t i i)  - E x(r)  (x* (r)u(t, i) } 
r*l 


vi  ■ ui  - “'“r 

We  now  choose  the  c+1  quadratic  forms  Qj(y) 


(5) 


Qj <y)  - y'VjV’y  - (Vjy)’Vjy  j - 1 c+1  (6) 

(b)  It  follows  from  the  method  of  synthesis  (see  Hartley  (1967), 
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J.  N.  K.  Rao  (1968)  that 

c+1 

E Qj(y)  - £ V\ 


kji  - J O']  u(t.i))'(yj  u(t,i» 

Now  since  v(t,J)  is  orthogonal  on  any  x(p)  (i.e.  since 

\ * 

v' (t,J)x(p)  - 0)  we  can  write  the  kji  in  the  alternative  form 
Kji  - 1 (Vj  v(t,i))'(Vjv(t,i» 


- zr  (v’Or.j)  v(t,i)}2 
tr 

showing  that  k^.  ■ k^. 

An  alternative  form  of  k^  is 

k1i-tt«VJV',(*JV')).  (9> 

We  shall  show  in  Section  6 that  the  symmetrical  matrix  K - (k^) 
will  have  full  rank  c+1  if  the  n x n matrices  V.V!  are  not 

» XX 

linearly  dependent. 

(c)  We  shall  also  show  in  Section  6 that  the  system  of  equations 

Q - Ito2  (10) 

\ 

is  consistent  even  if  the  rank  of  K is  degenerate.  Solving  (10) 
in  the  form 

o2  - K“ Q (11) 

we  shall , of  course,  be  particularly  Interested  in  the  full  rank 
case  when  K"  ■ K-1 . 
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It  may  be  helpful  to  give  an  idea  of  the  computational  efficiency  of 
the  present  method  by  tabulating  the  number  of  products  involved  In  the 
italn  operations  of  the  algorithm.  To  this  end  we  first  note  simplified 
versions  for  the  kc+1  p Observing  that  Uc+1  - I we  have  from  (5)  that 
Vc+1  - I - XX*  and  since  X'X  - I we  find  that  VC+JV£+1  - I - XX'  and 
finally  from  (9)  that 

kc+l,c+l  “ tr  (I  " ~ XX*)  - tr  (I  - XX1)  - n - k.  (12) 

Similarly  we  find  that 

kcfl,i  " tr  - XX«)(Vlvp>  - tr  (V.V'  - XX'VjVp  - tr  V±vp  (13)  j 

Further  we  note  the  form  of  V^y  i.e.  j 

. I 

i 

v;+iy  - y - xx'y.  (14)  I 

■ i 

Defining  now  the  adjoined  matrices  > 

, i 

u - (Ux  | ... | Uc>  V - (V,  Vc)  (15) 

the  bulk  of  the  work  consists  of  the  formation  of  the  elements  of  the  j 

symmetrical  matrix  V’V  ■ V’U  ■ U’V.  The  elements  of  this  matrix  are 
assembled  in  submatrices  in  accordance  with  the  partition  (15)  as  shown 
in  the  Schedule  1 below  where  it  must  be  remembered  that  the  range  of  the 

I 

column  index  t depends  on  i end  is  t • 1,  . ..,  m^  and  the  range  of  j 

t ■ 1,  ....  Bj  bo  that  the  submatrix  VjU^  has  dimensions  m^  x m^.  The  I 

kji  for  i >_  j * 1,  . c are  then  obtained  by  forming  the  sums  of  squares  ! 

, I 

of  the  elements  in  each  submatrix  in  accordance  with  (7).  I 

i 

Finally!  we  recite  the  formulas  for  the  remaining  coefficients  In 
the  equations  (10).  The  kc+£  and  kc+1  ^ ara  computed  from  (12)  end 

I 

14  ; 

j 

i 

i 


Schedule  1;  Submatrices  of  V*U 


V2 


u, 


u. 


'1  w2 
vtt.DVt.l)  v(i,l)'u(t,2) 

v<T,2)’u(t,2) 


l I I 


l • » 


Uc 

v (T,l)'u<t,c) 
v(T,2)*u(t,c) 


Vc  v(t,c)'u(t,c) 

(13)  respectively  And  the  right  hand  sides  of  Q (y)  from  the  second  form 
In  (6)  for  J ■ 1,  c while  Q0+1(y)  1*  given  In  accordance  with  (14)  by 

Qc+I<y)  - y’y  - (X'yj^x’y)  U6) 

We  can  now  summarize  the  approximate  number  of  products  involved  in 
the  various  operations  of  the  algorithms. 

We  list  the  algorithms  and  show  the  associated  numbers  of  products  in 

(D. 

1.  Orthogonallzatlon  of  X (k+(k+  - l)n,  where  k+  denotes  the 
number  of  columns  in  the  original  matrix  X) 

2.  Computation  of  X’l^  for  i - 1,  c,  (0,  subtotals  of  X) 

3.  Computation  of  X(X,Ui)  for  i - 1,  c from  equation  (5),  (nmk) 

A.  Computation  of  U’V  - V'V  in  accordance  with  Schedule  1,  (0  products 

since  the  elements  ere  subtotals  of  tha  elements  v(t,t)) 

5.  .Computation  of  k^  for  i,j  - 1,  ....  e from  equation  (7),  (|m(m+l)) 

6.  Computation  of  kc^  ^ for  1 ■ 1,  . c from  equation  (13)*  (on) 

7.  Computation  of  kc+^  from  equation  (12)*  (0  products) 

8.  Computation  of  the  Qj(y)  for  J - 1,  c+1  from  2nd  form  of 

equation  (6)  and  equation  (16),  ((»4*+l) (n+1)) 

* 

» 
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The  Important  point  Is  that  the  number  of  products  is  only  a linear 
function  of  the  number  of  data  lines  n.  An  approximate  formula  for  the 
total  number  of  products  is  n{k+(k+  - 1)  + (mfl) (k+1) 

5.  A Numerical  Example 

X 

A small  numerical  example  with  n - 4,  k ■ 3,  k ■ 2,  c ■ 1,  aj  ■ 2, 
m » 2,  m2  ■ n ■ 4 is  shown  in  schedule  2 below. 


Schedule  2;  A Numerical  Example  of  a Mixed  Model 


y 

X Original 

Ul 

u2 

X new 

Vl 

A 

110 

1 0 

1 

0 

0 

0 

(1/2) 

(1/2) 

+(1/2) 

-(1/2) 

2 

110 

0 1 

0 

1 

0 

0 

(1/2) 

(1/2) 

-(1/2) 

+(1/2) 

1 

10  1 

0 1 

0 

0 

1 

0 

(1/2) 

-(1/2) 

0 

0 

2 

10  1 

0 1 

0 

0 

0 

1 

(1/2) 

-(1/2) 

0 

0 

I 

! 


The  orthogonollzatlon  of  X (original)  to  X (new)  follows  the  standard  Uram 

Schmidt  procedure  and  reduces  the  k+  ■ 3 dependent  columns  to  k ■ 2 columns 

which  are  orthogonal  and  standardised.  Note  that 

• 

x(2)n«w  " x(2)old  “ <1/2>x(1>dd  *nd 

x(3)0id  ■ x(l)attf  - xC2)ngw  ®u*t  be  eliminated. 

Using  now  x(r)  » x(r)Mtf  we  orthogonallse  Uj  on  X and  compute  (see  (5)) 

x’U)  u(l,l)  - +(1/2),  x'(2)  u(l,l)  - +(1/2) 

and  hence 


i’i- 


likewise 


v(l,l)  - u(l,l)  - (l/2)x(l)  - (l/2)x(2) 


x'(l)  u(2,l) 


(3/2)  x’(2)  Ci (2,1) 


-(1/2) 


i 

I 

I 


i 


i 
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and  hence 


v<l/2)  - u(2,l)  - (3/2)x(l)  + (l/2)x(2) . 

This  yields  the  matrix  Vj  in  schedule  2 which  has  only  one  independent 
column.  The  elements  of  VjUj  require  the  computation  of 

v(l,l)'  u(l,l)  - (1/2);  vft.l)'  u(2,l)  - V(2,l)'  ud.l)  - -d/2) 
and 

v(2,l)'  u(2,l)  ■ 1/2  with  sum  of  squares  of  ktl  ■ A(l/2)2  ■ 1. 

Further  (equation  (12))  k22  - A - 2 ■ 2 and  (equation  (13))  kJ2  - k21 
• A(l/2)2  + A CO) 2 ■ 1 so  that  the  K matrix  is  given  by  K “ (J  £). 

Finally,  (equation  (16)) 

Q2(y)  • A2  + 22  + l2  + 22  - (-  9) 2 - (|  3)2  - 25  - - 25  - 22.5  » 2.5 

and  (equation  (6))  Qx(y)  - (»  2)2  + (-2)) 2 - 2. 

A A A 

The  solution  of  Q - K a2  therefore  yields  a*2  ■ 1/2,  a \2  ■ 1,5. 

6.  Optimality  Properties  and  the  Consistency  of  thd  Equations 
The  estimators  described  in  Section  3 may  be  Been  to  be  "best  at  o2^  ■ 0, 
i ■ 1,  . ...  c,  o2+1  ■ 1"  as  defined  by  L.  R.  LaMotte  (1973).  Therefore, 
the  consistency  of  equation  (10) , regardless  of  the  rank  of  K,  is  established 
as  Lemma  A by  LaMotte  (1973).  That  the  estimators  defined  by  (11)  are 
"beat"  among  invariant  quadratic  unbiased  estimators  guarantees  that  they 
are  admissible  in  that  class;  that  is,  no  other  invariant  quadratic  unbiased 
estimators  have  uniformly  lass  variance  for  all  o.  Further,  as  noted  by 
LaMotte  (1973),  the  estimators  (11)  have  the  property  that  in  any  model  for 
which  a uniformly  best  estimator  exists,  (11)  will  be  uniformly  best.  Finally,  it 
may  be  seen  that  the  "synthesis"  estimators  (11)  are  also  MINQUE  as  in 
Rao  (1971,  Section  6)  with  V ■ I.  No  claim  is  made  that  this  choice  of 

17 


-Aii  . 


i the  norn  has  any  particular  merits  among  the  rather  general  family  of  the 

i norms  covered  by  Minque  formulas.  However,  it  appears  to  be  reasonable  to 

us  that  in  the  absence  of  any  theoretical  criteria  for  selection  of  Minque 
! i norms  a norm  leading  to  simple  estimators  may  be  regarded  as  meritorious. 

I 

Following  Section  AS  in  LaMotte  (1973),  it  may  be  seen  that  the  rank 
of  K is  equal  to  the  number  of  linearly  Independent  matrices  among  VjV^, 

t 

| 1*1 c+1.  Thus  a singular  K may  occur  if  the  U.U!  matrices  are  not 

•I  11 

| all  linearly  Independent  or  if  there  exists  (see  (5))  a linear  combination 

of  the  matrices  whose  columns  are  contained  in  the  linear  Bubspace 

i spanned  by  the  columns  of  X.  In  the  firBt  case  the  singularity  is  caused 

I 

| by  the  design  leading  to  the  matrices,  while  in  the  second  tho  singular- 

I I 

! ity  is  caused  by  confounding  fixed  and  random  effects.  In  either  case,  (10) 

is  consistent  but  some  linear  combinations  of  the  variance  components  can 

not  then  be  unbiasedly  estimated.  We  should  stress  however  that  other 

I 

i special  bases  of  Minque  (not  necessarily  invariant  to  a)  may  also  deserve 

l 

! particular  attention. 


APPENDIX  2 


The  Asymptotic  Consistency  of 

In  discussing  the  asymptotic  behavior  of  J*  It  is  „f  cootM  ncccminy  to 

specify  the  Uniting  process  under  which  such  properties  .re  supposed  to  hold. 

Clearly  it  is  necess.ry  for  the  consistent  ostinotion  of  the  yorlonces  „|  . 

v«r  b,  that  the  nunber  of  Cement.  ln  the  vectors  b,  ell  tend  to  ..  L the 
identity  metri,  „ ,mve  ^ . „ th„  ^ ^ ^ ^ ^ ^ 

■seining  nt  we  eeaumo  that  their  Uniting  behavior  ie  minted  to  n by 


1-0 


Ln 


i 1-a, 

_<  tn^  jc  Un  ^ 


(17) 


where  0 < ^ < 1 and  L, U are  universal  constants.  More  specifically  we  assume 

c+1  but  a±  > 0 for  1 - 1,  . ..,  c.  Generalizations  to  situations  In 
which  at  - 0 for  several  components  are  under  consideration. 

Denote  now  by 


v(t,  i)  v number  of  elements  in  u(t,  i)  which  are  1 


(18) 


and 


v(t»  t,  j)  - number  of  rows  in  which  both  u(t,  i)  and 
u(t,  J)  have  elements  1, 


Using  these  concepts  we  introduce  the  following  conditions  of  • 
ality'  of  the  u(t,  i)  vectors.  We  assume  that 


(19) 

pseudo  orthogon- 


*n  < v(t,  i)  <.  u n 1 

t 

I; 

\ (where  *.  u are  universal  constants)  and  that 
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(20) 


a 


rarnwint'  rmtntrinr 


v(t,  i;  T,  j)  - o(v(t,  j>) 


(21) 

i 4 j with  i ■ 1,  . c + 1 
and  j ■ 1 , . . . , c 

“i 

The  relationship  between  (.17)  and  (20)  is  obvious  since  ][  v(t,  i)  «•  u so  that 

t-1 

(20)  implies  (17)  with  u ■ 7 and  L ■ — and  the  stronger  condition  (20)  implies 

a uniform  order  of  magnitude  for  all  v(t,  i)  in  a given  U^.  Since  the  columns 

of  the  matrices  are  orthogonal  we  have  v(t,  i;  . , i)  ■ 0 for  all  pairs  t 4 t* 

For  columns  u(t,  i) , u(t,  j)  with  i 4 j condition  (21)  is  satisfied  if  there  is 

an  asymptotically  uniform  distribution  of  the  v(t,  i)  rows  for  which  u(t,  i)  has 

elements  1 over  a f raction  qm^  of  the  columns  of  llj  where  0 < q < 1 since  the 

“1  -1 

fraction  of  v(tr  i)  which  gives  rise  to  u(t,  i:  x.  1)  will  he  0(a  tn.  ) ■ 
u-l  J 

0(n  J ) and  will  tend  to  aero. 

Next  we  must  introduce  conditions  on  the  orthogonal  standardized  matrix  X 

with  elements  x . Denote  by  T x2  the  sum  of  x2  over  those  rows  for  which 
sr  s(t,i)  Br  8r 

u(t,  i)  has  a 1 element  then  we  assume  that 


a,-l 

l *2  - 0(n  ) 

s(t,i)  8r 


(23) 


“i 

Since  l x2  - 1 and  the  number  of  terms  in  7 is  v(t,  i)  - 0(n  ) condition 

a " s(t,i) 

(23)  implies  that  asymptotically  the  x2  have  a uniform  density  x2  ■ 0(n  *) . 

sr  sr 

Finally  we  place  on  record  a consequence  of  conditions  (18)  to  (23) ; it 
follows  from  (5)  using  (18),  (19),  (23)  a).'  Shwartz’  Inequality  that 
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rr 


! ■' 


2a  -1 


y»  -1 

[ v(t,  i)  + 0(n  1 ) for  t - x,  i - j 


u'(t,  i)  v{t  j j) 


2a  -1 


0 + 0(n  1 ) for  t 4 t,  i . j 


(24) 


l v^t*  T*  J)  + 0(n  1 J ) for  i 4 j 


We  now  turn  to  the  asymptotic  behavior  of  the 


and  (25)  we  have  that 


kii  and  kij*  (8).  (17)*  (20), 


“i  mj 

kii  " l . l (u'(t,  1)  v(t,  i))2 

t"l  T“1 


m 


’ t-1  {U,(t*  1}  v(t*  15  } + I {u'(t,  1)  v(t * i)  J 


1-0  +2a. 
2.  Const  n 1 1 


+ 0(n 


tfr 

2-2a.+4aJ -2 


(25) 


*i”“i 


) 


>C  n 


1+a , 


for  all  i - 1,  ....  c+  l 


Fr«  <8).  (17).  (19).  (21)  «„d  (24)  h«ve  , c+1. 

J * 1|  * • • | c 


mi 

mj  r 

“ l 

l { 

t-1 

T-1  Vi 

“i 

- I 

r v(t 

t 

T 

? ? , “.+",-1  "l”j 

‘ f v<t'  11  T-  i)2  *•(.'  } ) I I »(,.  T, 


t t 


j) 


H.  IB. 

r r 2«±+2o4-2 
+ l l 0(n  1 i ) 

t T 


"i  ■j 

l o(v(t,  i)>  [ v(t,  i;  T,  j)  + 

t T 

i 2"V“i  2o  +2o.-2 

+ 0(n  1 J)  0(n  1 3 > 


W1 

0(n  1 J )n 


(26) 
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i ■< 


I 


l+«.  a.+a,  l+o. 

o(n  ; + 0(n  J)  ■ o(n  *) 


i+a. 

since  a j < 1 . Similarly  we  prove  by  symmetry  that  k.^  * o(n  for  i j < c. 
From  (25)  and  (26)  it  is  clear  that  for  all  large  n the  c x c matrix  k. . for 

, 3 l+«t 

i,  j ■ 1,  , c is  asymptotically  diagonal  with  diagonal  coefficients  >_  cn  . 

while  the  coefficients  are  asymptotically  equal  to  o(n).  Moreover  it  is 

obvious  from  (12)  that  k£+1  ^ > Cn.  Using  therefore  the  first  c equations 

* 

of  Kg2  * Q(y)  we  obtain  that 


o2  - 0(n  #1  ) (Qi(y)  - o(n)o2c+1)  - 0(n  ^(y)  + o.'n  0l)^2cfl 


_a.“l 


for  i » 1 , . . , c 


Substituting  (27)  in  the  last  equation  we  obtain 


°J+1  (cn  + o{*  a,nln)}  - Qc+1(y)  + £ Q^y)  o(n  ) 

i* 


°c+l  “ 0(n”1)Qc+l(y)  + °^n  1 3 


Substituting  (29)  back  in  (27)  we  obtain 


oj  » 0(n  **  ) Qi(y)  + o(n  % Qc+1(y) 


A 

Equations  (29)  and  (30)  show  that  a2  is  estimable  from  the  Q^(y) . They  also 

A 

ahow  that  o2  Is  consistent  provided  we  can  show  that 
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2a  +2 

Var  Qr <y)  - o(n  r ) 

Var  Qc41(y)  - o(n2) 


for  r ■ lt  .... . c 


(31) 


L It 

since  CovQ^yJQj  (y)  - 0(VarQi(y)’  Var<^(yK), 

In  order  to  prove  the  first  result  in  (31)  we  use  formulas  [22],  [32]  , 

[33]  and  [34]  of  J.N.K.  Rao  (1968)  with  slightly  altered  notation.  Formula  [22] 
gives  E Q2(y)  in  the  form 


c+1 


c+1 


c+1 


E(q_(y)li)  “ 2 H + I ciicri  + l hi^i 

r i$j-l  ij  i i iml  ii  i w 1 ** 


(32) 


where  ■ E b^  are  the  4^  moments  of  the  elements  of  b^.  Noting  that 
Var  Q (y)  - E Q (y)2  - E2(Q  (y))  the  leading  terms  of  c..  and  c . given  by 

t t t 11  lj 

J.N.K.  Rao's  equations  [33]  and  [32]  cancel  and  we  are  left  to  consider  the  orders 


of  magnitude  of 


i 

c. .-2h.  - l l (q  (U(t,  i)  + u(T,  i))  - qv(u(t,  i))  - Q (u(t,  i))2 

11  1 t<T-l 


m 


(33) 


r 

» l l ( l 2(u(t,  i)'  v(s,  r))  (u(r,  i) ' v(s,  r)))2 

t<T“l  B**l 


Consider  first  the  case  r - i.  We  distinguish  two  terms  when  s - t and  s ■ t. 

For  those  two  terms  (u(t,  i) ' v(s,  i))  (u(t,  i) ' v(s,  i))  is  from  (24)  of  the 

»i  2a.-l  3oj-l  "r 

order  of  magnitude  0(n  ) 0(n  ) * 0(n  ) . For  the  remaining  terms  In  ) 

B"1 

4a.-2 

the  product  is  of  the  order  0(n  ) but  the  number  of  terms  is  of  the  order 
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r 


l-a.  6a. -2  2-2a.  6a. -2 

0(n  z)  so  that  {£}2  is  0(n  ) and  hence  « 0<n  ) 0(n  1 ) - 

a 

4a.  2u  +2 

0(n  l)  ■ o(n  ) since  a^<  1. 

Consider  next  the  case  r i i and  r 4 c + 1.  We  have  from  (33)  and  (24) 


i 

c. . “II  (I  (v(t,  ij  a.  r)  v(t,  i;  e,  r)  + 0(n  * 1 ) 

1 t<T  a 


2a^+2oir“2 


a +a  -1 

+ 0(n  r ) (v(t,  i;  s,  t)  + v(t,  i;  a,  r)))2 


"i  20.+a  -1 

l l (o(v(b,  r))  l v(t,  i|  a,  r)  + 0(n  r 

t<T  8 


+ 0(n  1 r ) (v(t,  i)  + v(t,  i))}2 


mi  a +ei.  2aj+e  “1 

l l {°(n  *)  + 0(n  1 r ))! 


) 


(34) 


t<T 


2+2a 


a.+2a  +1  20^20 


o(n  r)  + o(n  1 r ) + 0(n  1 r) 


2+2o 

o(n  r). 


The  case  r^i.r-c+l  follows  on  the  same  lines  as  (34)  except  that  ■ 0 

and  that  v(t,  i;  a,  c+1)  v(t,  i;  g,  c+1)  - 0 since  u(s,  r)  has  a 1 only  in  the 
til 

a row  and  either  u(t,  1)  or  u(t,  i)  have  a sero  In  that  row.  Tha  order  of 

2a  -1  , 2a 

magnitude  of  O will  therefore  be  0(n  1 ) and  c^  will  be  0(n  x)  - o(n2). 

Tha  traatmant  of  the  c^  in  J.N.fC.  Rao'a  formula  [33]  follows  on  similar 
lines  to  the  above  proof  for  the  if  of  tha  two  alternatives  i < J , J < i 
in  (21)  the  smaller  a^,  a_,  is  selected  for  wajorisationa. 
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It  remains  to  consider  the  terms 


in 

■ l Q*.(u(t,  i))  ■ l { J!  (u’Ct,  i)  v(s,  r))2)2 
t-1  t-1  a-1 


For  the  case  r - i we  have  using  (24) 


“i  "i 

K “ I U«’(t;  i)  v(t i i))2  + [ <u'(t,  i)  v(a , i))2)2 

t-i  Byt 


„i  2a  3a  -1 

I (0(n  4>  + 0(n  1 ))2 

t-1 


l+3a.  4a.  5a. -1 

0(n  x)  + 0(n  l)  + 0(n  1 ) 


2a  +2  2a  +2 

o(n  . ) « o(n  r ) 


for  i - r i c + 1, 


o(n2) 


for  i - r - c + 1. 


For  the  case  i + r and  r + c + 1 


ID  ID 

r1  rr  -1 

h.  - z { l <v(t,  i;  B,  r)  + 0(n  1 r ))2}2 
* t-1  8-1 


n m 

i r Oj+a  -1 

l ( l o(v(s,  r))v(t,  i;  b,  r)  + 0(n  ) £ v(t,  i;  s,  r) 

t-1  s-1  a 

1-a  2a  +2a  -2 
+ 0(n  r)  0(n  1 r ))2 


i a,+a  2a. +a  -1 

« l {o(n  1 r)  + 0(n  1 r )}2 

t-1 

a,+2a  +1  2a  +2a  3a  +2a  -1 

- o(n  ) + o(n  r)  + 0(n  r ) 


2+2a 

o(n  r). 


Finally  for  r ■ c + 1,  i ^ r vie  have 


h - l i l <v(t,  i;  b,  r)  4 0(n  1 ))2) 

1 t-1  a-1 


®i  a.-l  2o  -1  2 

“I  i I v(t,  i;  b,  r)2  + I v(t,  i;  a,  r)  0(n  ) + 0(n  )) 

t*l  s a 


(3B) 


Tbw  since  v(t,  i;  a,  c 4 1)  is  either  0 or  1 we  have  that 
l v(t,  i;  a,  c 4 1)  - v(t,  i)  eo  that 


l \>(t,  i;  8,  c 4 l)2  « 


( 39) 


- 0(n  *)  0(n  l) 

- o(n2). 

M * 

Since  o2  ia  unbiassed  and  Cov  (o2)  4 0 ae  n ■ it  follows  that  o2  is 
consistent.  Moreover  if  we  replace  any  negative  c2  by  0 the  resulting  statistic 
say  S2  has  a smaller  mean  square  error  and  hence  is  also  consistent. 

2 

The  consistent  estimator  B may  Berve  as  a starting  value  for  the 

iterative  maximum  likelihood  estimation  procedure  described  by  Hernnerle 

and  Hartley  (1973).  Under  certain  regularity  conditions  (not  discussed 

here)  one  single  cycle  of  the  iteration  will  result  in  asymptotically 

2 

efficient  estimators  of  o and  a.  If  the  iteration  is  carried  to  convergence 
solutions  of  the  ML  equations  are  reached.  If  no  ML  cycles  are  performed 
a consistent  estimator  & of  a can  ba  computed  from  the  generalized  least 
squares  (ML)  equations. 
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f 

X 


& “ (X*H“1X)“1(X,H"1y) 


where  H » 


c 0 
+ I 
i-1  0 


1 

2 

cKl 


Vi 


. ^ 


(AO) 


It  hac  been  shown  by  HemtuitUa  and  Hartley  (1973)  that  (AO)  can  ba  computed 

directly  from  the  «ati  X’l^  matrlcaa  without  the  Inversion  of  the 

n x n matrix  H using  their  eo  called  W transformation.  In  fact  the  W 

o 

matrix  (their  equation  (19))  is  essentially  given  by  the  V'V.  matrices 

9b  4 

(see  the  above  Schedule  1)  and  by  the  contrasts  V^y  required  in  the  computation 
of  (^(y). 

The  variance  covariance  matrix  of  & can  likewise  be  oomputad  through 
the  W transformation. 
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MEASURE  OF  EFFECTIVENESS  FOR  DIVISION  LEVEL  MODELS 
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ABSTRACT 

High  level  excruslons,  using  the  Division  Battle  Model  (DBM)  or  a 
similar  game,  are  expected  to  become  more  Important  In  the  performance  of 
future  Cost  and  Operational  Effectiveness  Analyses  (COEAs).  It  Is  there- 
fore necessary  that  a good  Measure  of  Effectiveness  (MOE)  for  use  with 
these  games  be  developed.  Certain  MOE,  such  as  the  force  exchange  ratio 
or  other  ratios,  have  become  accepted  as  providing  good  estimates  of  the 
results  of  high  resolution,  compapy/battallon  level  combat  simulations. 
Efforts  have  also  been  made  to  develop  analytical  weighting  systems  for 
the  different  weapons  In  order  to  compute  weighted  MOE.  Both  of  these 
methods  have  been  used  to  analyze  the  outcome  of  DBM,  a low  resolution 
division  level  war  game,  but  nelgher  has  been  entirely  satisfactory.  It 
Is  hoped  that  this  paper  will  stimulate  Interest  and  further  investigation 
Into  the  analysis  and  Interpretation  of  combat  simulation  results. 


The  TRADOC  Systems  Analysis  Activity  (TRASANA)  has  recently  completed 
a major  weapon  system  study,  using  a division  level  war  game  as  one  of  the 
analysis  tools.  In  the  course  of  this  work,  the  problem  of  finding  a prop- 
er measure  of  effectiveness  to  distinguish  between  the  competing  alterna- 
tives arose.  This  problem,  of  course,  Is  common  to  all  studies  using  models 
or  simulations,  but  It  does  take  on  some  different  aspects  at  division  level 
than  at  company/battalion  level.  A broader  way  of  stating  the  problem,  and 
perhaps  the  better  way  In  the  long  term  Is:  How  should  a model  or  experi- 
ment be  designed  In  order  to  distinguish  between  completing  weapon  systfems? 

Since  It.  Is  not  posslble-to  do  complete  field  testing  on  every  pro- 
posed weapon  system,  the  use  of  simulations  has  been  an  important  part  of 
the  test  and  selection  process.  Now  there  Is  a growing  Interest  In  using 
war  games,  which  have  been  used  principally  as  training  aids  In  the  past, 
as  analysis  tools.  A war  game  may  be  defined  as  a combat  simulation  that 
Is  characterized  by  manual  Interplay  and  takes  place  In  a simulated  combat 
environment.  This  paper  describes  in  some  detail  the  war  game  used  In  the 
TRASANA  study  and  demonstrates  the  dilemma  faced  In  attempting  to  apply  the 
"accepted"  measures  of  effectiveness  to  the  results.  It  Is  hoped  that  this 
presentation  will  both  Identify  and  lead  to  further  Investigation  of  a 
problem  area  that  Is  critical  to  the  weapon  system  evaluation  process. 

The  model  used  was  Division  Battle  Model.  It  Is  a computer-assisted, 
manual  war  game  developed  by  the  General  Research  Corporation  (GRC)  and  Is 
designed  to  support  studies  of  the  performance  of  weapons,  organizations, 


I 


and  tactics  employed  by  a division  sized  force.  Figure  1 describes  DBM 
schematically.  The  study  was  primarily  concerned  with  the  ground  combat 
portion  of  the  game,  which  Is  linked  to  two  other  GRC  models:  CARMONETTE, 
a stochastic,  high  resolution,  company/battallon  simulation,  and  COMANEX, 
an  extension  of  classical  Lanchester  theory.  COMANEX  Is  both  a stand 
alone  simulation  and  the  ground  combat  assessment  routine  In  DBM. 

CARMONETTE's  primary  activities  Include  the  movement  of  units,  the 
detection  of  targets,  and  the  firing  of  weapons.  Unit  resolution  Is 
variable  from  Individual  weapon  system  to  platoons.  The  model  Is  critical 
event  sequenced  with  time  recorded  to  one- ten  thousandth  of  a minute.  The 
spatial  representation  Is  variable  but  a 100  meter  grid  Is  normally  used, 
Input  to  the  model  are  detailed  descriptions  of  the  units  being  played, 
performance  characteristics  of  the  various  weapon  types,  a set  of  orders 
for  each  unit,  Including  movement  and  target  priorities,  target  detection 
probabilities,  and  a detailed  description  of  the  terrain.  The  unit  orders 
must  be  based  on  a predetermined  scenario  and  on  a specified  tactical 
doctrine,  either  current  or  one  to  be  tested.  The  terrain  description 
required  by  grid  square,  Includes  average  elevation,  height  of  vegetation, 
cover  and  concealment.  Output  from  a CARMONETTE  run  Is  a computer  listing 
of  every  event  assessed  during  the  battle  which  Includes  the  elements 
killed,  various  operational  statistics,  and  Information  on  engagement 
ranges.  Various  summary  routines  may  be  used  to  collect  the  data  In  pre- 
paration for  further  analysis.  In  preparing  CARMONETTE  output  for  use 
as  DBM  ground  combat  history,  a sufficient  number  of  replications  of 
each  scenario  must  be  made  to  develop  good  estimates  of  battle  outcome. 

DBM  Is  a game  rather  than  a simulation.  It  Is  played  on  a tactical 
type  map  of  scale  1:25,000  to  1:50,000  which  provides  sufficient  detail 
to  support  the  levels  of  unit,  time,  and  space  resolution  employed.  For 
the  TRASANA  study,  It  was  more  practical  to  resolve  to  the  company  level 
for  the  Blue  reinforced  division  and  to  the  battalion  level  for  tne 
attacking  Red  combined  arms  army,  but  different  levels  may  be  used  depend- 
ing on  the  gamers'  purpose.  Space  Is  measured  to  the  nearest  hundred 
meters.  Time  may  be  measured  to  the  nearest  five  minutes,  but  It  was 
found  that  to  the  nearest  quarter  hour  was  generally  sufficient.  While 
the  game  can  be  played  In  open,  semi-closed,  or  closed  modes  depending 
on  the  degree  to  which  Intelligence  Is  considered  a critical  factor,  It 
has  principally  been  used  only  In  the  open  mode.  In  this  way,  two  to 
four  hours  of  battle  time  can  be  gamed  per  working  day  by  a player/con- 
troller team. 

The  manual  operations  of  DBM  consist  mainly  of  decision  making,  event 
determination  and  time  sequencing,  while  the  computerized  portion  focuses 
on  the  determination  of  battle  losses,  tabulation  and  reporting  of  battle 
results,  and  updating  of  stored  Information.  Manual  play  takes  place 
over  approximately  four-hour  Increments  of  battle  time  but  may  be  stopped 
sooner  if  the  control  team  determines  that  a critical  event  has  occurred. 
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Figure  1.  Diagram  of  Division  Battle  Model 


At  that  point,  computer  Input  Is  prepared,  describing  the  various  com- 
bat actions  that  occurred  during  the  manual  phase.  The  computer  routines 
then  assess  the  casualties  and  provide  a printout  showing  losses,  cause 
of  loss,  and  past  and  present  unit  strength.  The  control  team  makes 
necessary  adjustments  to  unit  locations  and  notifies  the  players  of  the 
battle  outcome,  after  which  manual  play  Is  resumed. 

In  order  to  provide  the  necessary  background,  the  ground  combat 
assessment  routine  must  be  described  In  some  detail.  The  routine  CQMANEX 
solves  a set  of  Lanchester  type  equations  for  the  different  weapons 
systems  Involved.  These  are  shown  for  the  simple  case  of  one  Blub  and 
one  Red  weapon  system.  It  may  be  noted  that  these  equations  reduce  to 
the  Lanchester  square  law  for  the  case  where  all  targets  are  acquired, 
that  Is  the  Pa  approach  0,  and  to  the  Lanchester  linear  law  as  the  Pa 
approach  unity,  or  no  targets  are  acquired.  COMANEX  then  treats  combat 
situations  between  these  two  extremes  of  the  Lanchester  formulation. 

These  equations  are  easily  generalized  to  the  case  of  several  Blue  and 
Red  weapon  types  as  Is  shown  by  the  following  equations: 

Homogeneous  Forces 


-f  - * 


-§  - * 


b - Rate  at  which  one  Blue  weapon  kills  Red  weapons  given 
acquisition  of  at  least  one  target. 

» Probability  that  a specific  Blue  target  Is  unacquired  by 
0 an  individual  Red  flrer. 

b •>  Number  of  Blue  weapons  at  time  t 

Similar  definitions  for  r,  PRi  and  R 

Heterogeneous  Forces 


VJ-VJ  h <z> 
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b . . * Rate  at  which  one  type  i Blue  weapon  kills  Red  weapons 
of  type  3 

B.  ■ Number  of  type  i Blue  weapons  at  time  t 

PB  - The  same  as  for  Homogeneous  Forces 

The  values  of  the  b^t  PB,  and  PR  are  calculated  by  a COMANEX  prepro- 
cessor from  the  results  of  each  high  resolution  scenario.  These  are  then 
used  by  the  DBM  ground  combat  assessment  routine,  the  COMANEX  simulator, 
to  solve  the  equations  and  develop  the  results  of  battle  groups  using 
different  but  similar  force  structure  from  that  used  In  the  original 
CARMONETTE  work.  The  validity  of  COMANEX  In  reproducing  the  results  of 
CARMONETTE  and  In  predicting  the  outcome  of  different  scenarios  has  been 
tested  both  by  the  developer  and  at  TRASANA  and  has  been  shown  to  be 
quite  good.  While  these  models  simulate  combat  more  or  less  realisti- 
cally depending  on  ou*r  point  of  view,  perhaps  more  from  the  point  of 
view  of  a high  level  staff  officer,  devastatlngly  less  from  the  aspect 
of  an  Infantry  private,  they  alone  say  nothing  about  effectiveness.  In 
actual  combat,  the  critical,  In  fact  the  only  measure  of  effectiveness 
Is  mission  accomplishment.  Models  are  not  as  Inflexible. 

In  high  resolution  simulations,  the  win  or  lose  criteria  may  be 
difficult  to  define  and  quite  arbitrary  If  It  Is  done.  Typically, 
battalion  level  simulations  are  not  stopped  at  a logical  breakpoint  but 
are  carried  to  extremes  (e.g.,  90*  Red  system  losses)  that  distort  both 
time  and  system  losses.  After  making  all  of  the  necessary  model  runs 
for  each  weapon  system,  the  analyst  will  analyze  all  of  tne  data  to 
Identify  a logical  breakpoint.  This  "analysis  point"  Is  seldom  driven 
by  tactical  consideration  (If  It  were,  It  could  be  specified  before 
hand)  but  rather  by  the  necessity  to  find  a point  In  the  model  output 
where  all  of  the  competing  systems  can  be  "objectively  compared." 

The  typical  numerical  output  from  a simulation  Is  In  the  form  of 
a killer-victim  scoreboard  as  Is  shown  In  Figure  2.  These  may  be  de- 
veloped as  frequently  as  Is  desired  or  practical  during  the  simulation 
and  provide  a summary  of  the  battle  events. 
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Figure  3 shows  some  of  the  traditional  type  measures  of  effective- 
ness used  with  killer-victim  scoreboard  data.  The  loss  exchange  ratio 
and  force  exchange  ratio  are  often  used  with  CARMONETTE  type  simulation. 
The  total  tank  ratio  and  tank  contribution  are  less  common  but  have  still 
been  seen. 

When  one  computes  the  value  of  an  MOE  at  an  analysis  polnti  the  dif- 
ficulties are  usually  just  beginning.  If  different  values  for  the  MOE 
are  found  (as  Is  desired,  If  multiple  MOE  are  used.  It  Is  also  desired 
that  any  differences  are  In  the  same  direction)  some  determination  must 
be  made  about  the  significance  of  the  differences.  If  a stochastic 
model  such  as  CARMONETTE  Is  being  used,  one  can  of  course  conduct  a 
statistical  significance  test  providing  there  is  some  knowledge  about 
the  distribution  of  the  model  output.  If  not,  non-parametrlc  statistics 
can  be  used.  If,  on  the  other  hand,  a deterministic  battalion  level 
model  is  being  used,  a difference  of  10#  Is  the  accepted  figure  for 
significance.  If  no  significant  difference  can  be  shown  In  the  MOE,  It 
Is  hoped  that  the  model  has  provided  enough  "valuable  Insights"  to  come 
to  a decision  on  the  best  (preferred)  system. 

When  analyzing  the  results  of  a division  level  model,  things  are 
not  as  clear  cut.  First,  It  Is  difficult  to  use  any  of  the  traditional 
ratio  type  MOE  because  the  force  ratios  are  constantly  changing  with 
the  Intensity  of  the  battle  and  the  tactical  decisions  being  made  by 
the  players.  Analysis  points  can  be  Identified  as  some  arbitrary  frac- 
tion of  survivors  (or  losses)  of  the  total  force  available  and  then  the 
ratio  type  MOE  may  be  used,  but  the  problem  here  Is  that  varying  numbers 
of  forces  actually  participate.  In  simulations  at  company/battallon 
level,  a certain  force  Is  committed  Initially  and  flahts  to  the  conclu- 
sion, with  the  entire  battle  taking  place  In  a time  frame  of  approxi- 
mately one-half  hour  or  less.  In  contrast  to  this,  a division  game  may 
require  a period  of  one  to  four  days  of  combat  time,  while  the  Intensity 
varies  not  only  with  time,  but  also  with  space  along  the  division  front. 
The  numbers  of  engaging  forces  change  as  a result  of  both  combat  attri- 
tion and  the  tactical  decisions  made,  such  as  commitment  of  the  reserve 
or  withdrawal  of  a unit  to  another  position. 

The  strong  point  of  the  division  game,  however,  Is  that  tactical 
stopping  points  can  be  easily  Identified  prior  to  the  start  of  the  game. 
For  the  TRASANA  study  the  end  of  game  criteria  was  simply  mission  accom- 
plishment by  Red  or  Blue.  The  game  was  stopped  with  Red  accomplished 
his  mission  by  penetrating  the  Blue  rear  boundary  or  when  Blue  accom- 
plished his  mission  by  causing  Red  to  break  off  the  attack  and  go  on  the 
defensive.  It  was  fortunate  In  the  study  that  there  were  three  distinct 
outcomes  for  our  three  leading  candidates.  With  one  candidate  Blue  lost: 
with  the  second,  he  prevented  a penetration,  but  at  a cost  of  an  entire 
division.  However,  with  the  third  candidate  Blue  not  only  prevented 
the  breakthrough  but  had  the  capability  to  mount  a strong  counter-attack. 
Even  with  results  that  diverse,  a quantitative  MOE  Is  required  If  only 
to  have  something  to  use  with  cost  comparisons. 


TRADITIONAL  MOE 


Loss  Exchange  Ratio  (LER) 


ICO  . Number  of  Red  System  Lost, 
LC’'  Number  of  Blue  Systems  Lost 


Force  Exchange  Ratio  (FERl 


ffr  . Number  of  Red  Systems  Lost/Inltlal  Number  of  Re 
rtK  Number  of  blue  Systems  Lost/Inltial  Number  of  1 


. Loss  Exchange  Ratio 
Engaging  wee  Ratio 


Total  Tank  Ratio  (TTR) 


_ Red  Tanks  Killed 
17,1  ETui  Tanks  HTfiJ 


Tank  Contribution  (TC) 


tc  . Red  Systems  Killed  by  Blue  Tanks 
Blue  Tanks  Killed 


Figure  3 
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Figure  4 shows  an  example  of  the  Force  Exchange  Ratio  calculated 

• for  each  alternative  for  the  previous  manual  Interval  at  various  times 
during  the  game.  Comment  Is  unnecessary  on  the  difficulty  of  using 

\ this  as  an  MOE. 

f. 

Figure  5 shows  the  Loss  Exchange  Ratios  for  the  same  case.  Here 
l the  curves  have  been  smoothed  by  taking  cumulative  values  throughout 

< the  course  of  the  battle,  but  the  differences  He  only  In  the  relative 

positions  of  the  curves  and  are  still  difficult  to  Interpret.  The 

• arrows  show  points  of  equal  Red  losses. 

Simulations  have  long  been  used  as  test  beds  for  weapon  systems; 

> In  contrast,  war  games  have  traditionally  been  used  as  training  aids. 

, It  Is  becoming  recognized  that  the  games,  particularly  high  level  ones, 

have  a legitimate  use  In  the  analysis  process.  In  fact,  TRASANA  and 
the  Combined  Arms  Center  at  Fort  Leavenworth  are  devoting  considerable 
joint  effort  toward  Improving  existing  games  and  developing  new  ones 
for  use  In  both  training  and  analysis.  Use  of  the  game  does,  however, 
present  some  problems  In  experiment  design  and  data  Interpretation  that 
i'  have  not  been  fully  explored. 
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ANALYSIS  OF  RATIO  DATA  FROM  FIELD  EXPERIMENTATION 


Brian  Barr 

US  Army  Combat  Developments  Experimentation  Command 
Fort  Ord,  California 


ABSTRACT.  Measures  of  effectiveness  which  result  from  taking 
the  ratio  of  two  dependent  variables  are  difficult  to  analyze.  The 
problem  becomes  further  complicated  when  the  data  come  from  field 
experimentation  where  the  data  Is  rarely  "clean". 

Examples  of  the  type  of  data  Involved  are  presented  along  with 
the  reasons  why  the  data  cannot  be  analyzed  using  standard  techniques. 
The  analysis  approach  of  looking  at  the  numerator  and  denominator  sep- 
arately Is  discussed  along  with  the  reasons  why  this  technique  cannot 
be  universally  applied  to  ratio  data. 

I.  INTRODUCTION.  The  Combat  Developments  Experimentation 
Command  ( CUEt J conducts  field  experiments  for  the  U.S.  Army.  These 
experiments  quite  often  take  the  form  of  Instrumented  force-on-force 
field  tests  In  which  one  tactical  unit  engages  another  In  a relatively 
free  play  environment.  The  Instrumentation  permits  the  collection  of 
detailed  data  on  the  engagement  sequences  as  they  occur.  Normal ly, 
four  or  five  Independent  variables  are  controlled,  but  the  number  of 
uncontrolled  or  nuisance  variables  can  be  almost  Infinite. 

Examples  of  the  types  of  measures  of  effectiveness  that  have 
been  used  In  previous  experiments  Include  the  ratio  of  red  kills  to 
blue  kills,  the  ratio  of  detections  to  engagements,  the  ratio  of  targets 
exposed  to  detections,  and  the  ratio  of  ammunition  expended  to  hits  or 
kills.  One  ratio  In  particular  that  has  appeared  repeatedly  Is  the 
casualty  exchange  ratio,  the  ratio  of  red  kills  to  blue  kills.  (Many 
arguments  can  be  presented  for  and  against  using  this  as  a measure  of 
effectiveness.  Without  getting  Into  that  topic,  It  should  suffice  to 
say  that  this  MOE  has  appeared  before  and  will  probably  continue  to 
be  used.) 


II.  THE  PROBLEM.  The  problems  with  analyzing  the  casualty 
exchange  ratio  from  f lei d experimentation  data  start  before  the  cal- 
culation of  the  MOE.  The  first  problem  Is  that  the  sample  size  Is 
usually  severely  limited  by  practical  constraints  (field  experiments 
are  extremely  expensive).  Time  and  cost  constraints  quite  often 
overshadow  statistical  considerations  and  the  analyst  must  do  the 
best  with  what  he  Is  given.  The  sample  size  Is  further  complicated 
because  up  to  2 5 percent  of  the  trials  may  be  Invalidated  due  to 
operational  problems  or  Instrumentation  failures.  When  these  trials 
cannot  be  rerun  the  result  Is  unequal  sample  sizes.  The  sample  sizes 
may  also  be  unbalanced  by  the  nature  of  the  MOE.  The  sample  size  of 
the  ratio  of  targets  to  detections,  for  example,  is  dictated  by  the 
number  of  detection  opportunities  which  randomly  appear  during  the 
field  trial. 
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A typical  field  experiment  design  might  look  like  this: 


The  Independent  variables  A and  B have  three  and  two  levels  respectively. 
Two  more  variables  may  be  nested  equally  In  the  cells  and  an  undetermined 
number  of  nuisance  variables  may  appear  during  execution.  (In  PARFOX  VII. 
for  example,  with  54  trials,  nine  variables  could  be  shown  to  Influence 
the  dependent  variable.)  These  nuisance  variables  normally  result  In 
great  variability  of  the  data  within  each  cell. 

The  two  elements  of  the  ratio  MOE  are  rarely  If  ever  Independent 
of  one  another.  The  number  of  red  players  who  have  been  killed  obviously 
Influences  the  number  of  blue  players  who  will  be  killed.  Also  the  dis- 
tribution of  the  number  of  kills  on  either  side  Is  usually  skewed  In  one 
direction  and  often  truncated  by  an  arbitrary  end  of  trial  criteria.  Thus, 
the  distribution  Is  rarely  normal. 

III.  PAST  TRIALS.  CDEC  has  been  relatively  successful  In  analyzing 
casualty  exchange  ratio  data  by  using  analysis  of  covariance  techniques; 
however  this  has  only  been  possible  because  the  basic  statistical  question 
of  how  to  test  hypotheses  on  ratio  data  has  bean  avoided.  Instead  of 
analyzing  the  ratio,  the  numerator  and  the  denominator  have  been  analyzed 
Independently,  then  conclusions  have  been  drawn  from  the  results  of  these 
two  analyses. 

The  approach  taken  so  far  has  followed  the  following  logic: 


If  R1  Is  greater  than  R2  and  If  B1  Is  smaller  than  B2,  then  the  ratio 
Rl/Bl  must  be  greater  than  the  ratio  R2/B2.  Likewise,  If  R1  equals 
R2,  and  B1  Is  smaller  than  B2,  then  Rl/Bl  Is  greater  than  R2/B2;  or 
If  B1  equals  B2,  and  R1  Is  greater  than  R2,  then  Rl/Bl  Is  greater  than 
R2/B2 . This  logic  doesn't  appear  to  bother  anyone  until  the  case  where 
R1  Is  greater  than  R2  and  B1  Is  greater  than  B2  (so  far  CDEC  has  not 
had  this  appear,  but  Is  would  seem  to  be  just  a matter  of  time).  Look 
at  the  possibilities: 

(«)  4 1 

17  - T 

(b)  4*1 

17  >17 

(c)  ic  i 

17  * 7 


f 


In  each  case,  the  statistical  testing  on  the  separate  variables  tells  us 
the  same  thing  (R1  Is  greater  than  R2  and  B1  Is  greater  than  B2);  but  the 
ratios  are  equal,  greater,  and  smaller  respectively. 

An  additional  consideration  that  will  not  be  discussed  but  should 
be  mentioned  Is  the  case  where  we  have: 


The  ratios  are  equal,  but  obviously  the  battles  are  not  Identical  since 
the  casualties  on  both  sides  vary  by  a factor  of  10. 

IV.  SUMMARY.  Every  Indication  points  to  the  fact  that  ratio 
type  measures  of  effectiveness  will  continue  to  appear  In  field  experi- 
mentation. Literature  searches  have  failed  to  reveal  acceptable  solu- 
tions to  the  analysis  of  ratio  data,  and  eventually  the  case  will  arise 
where  the  separation  of  numerator  and  denominator  will  no  longer  be 
adequate.  Further  work  needs  to  be  conducted  In  this  area,  both  to 
strengthen  Army  field  experimentation  and  to  benefit  the  whole  statistical 
community. 
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PHYSIOLOGICAL  AND  PERCEPTUAL  ADAPTATION  TO 
SUSTAINED  AND  MAXIMAL  WORK  IN  YOUNG  WOMEN 

1 1 2 
D.  Kowal  , D.  Horstman  , and  L.  Vaughan 

Exercise  Physiology  Division,  US  Army  Research  Institute  of  Environmental  Medicine, 

Natick,  MA1  2 

Department  of  Physical  Education,  Wellesley  College,  Wellesley,  MA 

In  order  to  better  understand  differences  In  physical  work 
performance  between'  men  and  women,  a study  was  carried  out  to  determine  If  a*)  women 
perceive  physical  effort  differently  than  men;  b.)  does  previous  activity  experience 
influence  the  perception  of  effort;  and  c.)  how  does  aoute  and  chronic  training  affect 
the  perception  of  effort  and  ability  for  prolonged  work  in  women.  Preliminary  analysis 
suggests  that  |aprceived  exertion  in  women  is  influenced  by  sctivity  history  and  sell 
concept  prior  to  participation  in  aerobic  training.  The  perceptual  measures  displayed 
a substantial  interaction  depending  upon  self  concept/prior  activity  and  group  affilia- 
tion of  these  women.  Psychological  estimates  of  physical  self  concept  improved  for  the 
previous  low  activity  training  group  but  not  for  the  previoue  high  activity  training 
group  when  compared  to  controls. \ 

Background; 

Presently,  about  5%  of  the  workforce  of  the  US  Army  Is  comprised  of  women, 
the  highest  percentage  in  peacetime  history.  This  figure  will  Increase  substantially 
within  the  next  few  years  with  a projected  contingency  of  50,000  women  soldlors. 

The  role  of  the  Army's  women  has  also  undergone  drastic  change;  whereas 
previously  confined  to  less  physically  demanding  tasks  (such  as  clerical  work),  all 
Military  Occupation  Specialties  are  presently  available  to  women,  with  the 
exception  of  combat  arms.  With  tine  prospect  of  increasing  numbers  of  women 
serving  in  a greater  variety  of  work  roles,  our  interests  have  focused  on  the 
performance  of  prolonged  physical  work  by  women.  Sustained  performance  of 
physical  work  is  governed  by  two  distinct  factors;  (a)  one's  capacity  for  work  and 
(b)  one's  willingness  to  endure  hard  physical  work.  Capacity  Is  objective  In  nature 
and  dependent  to  a large  extent  upon  genetic  traits,  but  can  be  modified  by  other 
Influences  (primarily  physiological),  such  as  training,  diet,  and  environment 
(1,2, 3, 4).  Willingness  to  endure  Is  more  complex  and  subjective  in  naturo,  and 
probably  governed  by  psychosocial  factors  (3, 6, 7, 8). 

Given  this  situation,  the  question  Is  obvious;  Are  there  physiological  or 
perceptual  differences  between  men  and  women  that  may  obviate  the  latter  from 
performing  sustained  heavy  work.  Currently  available  research  provides  little 
Information.  However,  observations  In  our  laboratory  suggest  that,  when  asked  to 
perform  test3  which  require  a maximum  voluntary  contraction,  women  tend  to 
score  less  than  could  be  predicted  on  the  basis  of  physiological  indices,  e.g.,  lean 
body  mass.  It  has  also  been  reported  that  women  possess  approximately  hall  the 


arm  and  shoulder  strength  of  men,  3/4  the  leg  strengths,  and  3/4  the  aerobic 
capacity  of  the  average  man  (9).  Further  we  recognize  that  perception  of  work  is 
related  to  experience.  However,  because  society  has  often  considered  women 
incapable  or  It  unfemlnine,  many  women  have  not  experienced  strenuous  physical 
work. 

This  study  was  designed  to  evaluate  the  following  questions! 

1.  Do  women  perceive  work  differently  than  men  and  are  the  physiological 
and  psychological  factors  related  to  work  capacity  the  same  for  both  groups? 

2.  How  does  prior  experience  influence  the  perception  of  effort  and  the 
capacity  for  sustained  work  performance? 

3.  Do  women  who  have  had  high  activity  experience  differ  from  those  with 
low  activity  history  in  their  response  to  training? 

Progress! 

Seventy-five  women  volunteers  ages  18-22  served  as  subjects.  They  were 
assigned  to  one  of  3 groups!  Low  previous  activity,  experimental  (N  « 14)  and 
control  (N  ■ 13),  high  previous  activity,  experimental  (N  ■ 13)  and  control  (13)  and 
an  Intercollegiate  athlete  (high  fitness)  group  (N  * 13).  The  following  measure- 
ments  were  made  during  the  first  and  last  week  of  the  program.  Anthropometric 
measurements  were  made  of  height  and  weight.  Body  composition  was  determined 
by  measuring  skin  fold  thickness  with  a Harpenden  caliper  at  four  anatomical  sltesi 
'triceps,  biceps,  subscapular,  and  supraillac.  An  interrupted  treadmill  test  for 
maximal  aerobic  power  (\02  max)  was  performed  following  the  procedure  of  Taylor 
(10).  During  the  last  minute  ol  each  run,  the  expired  gas  was  collected  Into  vinyl 
Douglas  bags  and  analyzed  lor  oxygen  and  C02  content  Subjects  were  monitored 
electrocardlographically  during  all  runs.  \02  max  was  determined  when  the  oxygen 
uptake  did  not  Increase  with  an  increase  in  work  load.  At  the  end  of  each  run  the 
subject  rated  her  perceptual  response  during  the  workload  using  Borg's  report  of 
perceived  exertion  (RPE).  The  RPE  is  a ratio  scale  from  6-20  with  veibal  labelst  6 
■ very,  very  light  to  20  ■ very,  very  hard.  The  treadmill  test  for  aerobic  fitness 
was  performed  during  the  first  week  of  study  (Pre-training),  a week  later  (Acute) 
and  following  the  12-week  training  program  (Post-training).  These  replications 


were  performed  to  assess  changes  that  may  have  occurred  in  both  physiological  and 
perceptual  responses  to  maximal  work,  and  the  aerobic  training  program.  The 
training  program  consisted  of  12  weeks  during  which  the  women  ran  for 
progressively  longer  periods  of  time  at  a faster  pace.  Each  week  a 30  minute  test 
run  was  performed  to  assess  improvement  In  stamina  and  endurance.  During  the 
pre-  and  post-testing  sessions  the  subjects  were  asked  to  complete  a battery  of 
cognitive  and  behavioral  self-evaluation  questionnaires  designed  to  assess  their 
attitude  toward  exercise,  expectations  of  their  physical  capacity  and  performance. 

Anthropometric  measures  are  summarized  in  Table  1.  The  findings  suggest 
that  women  engaged  in  an  aerobic  training  program  can  expect  to  lose  body  fat  but 
gain  some  weight  even  though  they  are  maintaining  high  energy  expenditure*.  This 
is  attributed  to  the  increase  in  caloric  Intake  reported  by  the  members  of  the 
training  groups.  Table  2 summarizes  the  physiological  and  perceptual  responses  to 
initial,  acute  and  post  training  maximal  exercise.  Tne  anticipated  improvement  In 
aerobic  fitness  is  evident  with  improvement  in  \02  max  increasing  896  for  the  high 
activity  group  and  approximately  1596  for  the  low  activity  group.  It  is  difficult  to 
equate  the  perceptions  of  effort  (RPE)  reported  because  of  the  different  workloads 
Involved  at  the  end  of  die  training  program.  The  other  measures  of  aerobic  fitness, 
ventilation  (V£  max),  maximum  heart  rate  (HRmax)  and  maximum  workload 
(speed/grade)  also  showed  the  anticipated  Improvement  as  a result  of  training. 

Table  3 describes  die  physiological  and  perceptual  responses  to  a 20  minute 
endurance ^run  at  7096  of  VOj  max.  While  the  first  two  endurance  runs  were  bated 
on  initial  V32  max  values,  the  post-training  7096  workload  was  calculated  baaed  on 
the  subjects  post-training  VOj  maxj  i.e.  absolute  workload  was  Increased  from  t- 
1596  for  the  groups.  It  can  be  seen  that  the  perceptual  responses  to  the  same 
workload  (pre-acute)  were  quite  different.  This  finding  suggests  that  exposure  and 
activity  experience  alone  may  play  an  important  role  in  understanding  work 
performance  in  women  even  if  no  training  Is  Involved. 

Data  analysis  of  these  physiological  and  perceptual  measures  across  replica- 
tions of  the  maximal  performance  and  endurance  testa  are  In  progress.  It  can  be 
seen  in  Table  4 that  psychological  measures  of  attitude  toward  activity,  physical 
self-estimation,  hidden  shapes,  motor  satisfaction,  perceived  control  of  the 
environment  (lack  of  control)  and  physical  self  concept  did  not  demonstrate 


I 


substantial  differences  between  the  high  and  low  activity  groups.  However,  it  Is 
noteworthy  that  many  of  these  measures  were  apparently  different  from  the 
college  norm  population  scores.  This  could  be  expected  in  light  of  the  activity 
experience  of  the  latter  group. 

In  general  the  preliminary  data  analysis  Indicates  that  perceptual  responses 
are  Intricately  Involved  in  the  development  of  physical  work  capacity  In  women. 
Comparison  of  differences  in  peripheral  responses  between  women  and  men  will  be 
reported  subsequently.  The  population  studied  appears  to  be  rather  unique  and 
superior  to  the  college  norms  making  psychological  comparisons  difficult}  however, 
additional 'analysis  Is  In  progress. 
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TABLE  1.  Anthropometric  Characteristics  of  Women  with  Different  Activity 
Patterns  Before  and  After  an  Endwance  Training  Program. 
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TABLE  *.  Performance  Expectations  and  Self  Evaluation  of 
Physical  Abilities  In  Women  of  Different  Activity  Patterns 
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THEORY  OF  LEAST  CHI-SQUARE  FOR  POLYNOMIALS: 
IMPLICATION  FOR  DESIGN  OF  EXPERIMENTS 


Richard  L.  Moore* 

US  Army  Armament  Research  and  Development  Command 
System  Evaluation  Office 
Dover,  NJ  07S01 


ABSTRACT.  This  paper  extends  the  least  Chi-Square  theory 
(which  was  previously  developed [1  ] for  fitting  data  to  non-linear  func- 
tions of  the  parameters)  to  fitting  polynomial  functions  of  an  Independent  l 

variable.  The  underlying  concept  Is  that  a Chi-Square  Is  minimized.  i 

This  Chi-Square  is  the  ratio  of  the  sum  of  the  square  of  the  residuals  to  i 

the  variance  of  the  Instrumental  error  plus  the  sum  of  the  ratio  of  squares  ! 

of  an  appropriate  number  of  autocorrelation  coefficients  (with  delay  times  ! 

which  are  Integral  Increments  of  the  interval  between  observations)  to  their 
variances. 

The  normal  equations  are  extensions  of,  and  reduce  to,  the  ordinary 
least  squares  when  the  autocorrelation  coefficients  are  zero.  Iterative  solu- 
tion is  required  since  the  sum  of  squares  of  residuals  and  the  autocorrelation  i 

coefficients  depend  on  the  values  of  the  parameters.  Two  different  approach-  j 

es  for  the  Iterative  solution  have  been  programmed  for  a commercial  program- 
mable calculator.  Typical  results  will  be  presented.  ! 

i 

1 

Effective  use  of  this  theory  requires  measurement  of  instrumental  j 

errors,  and  If  appropriate,  randomization  of  the  order  In  which  the  In-  j 

dependent  variable (s)  are  varied.  ] 

I 

The  use  of  the  theory  Is  expected  to  give  a set  of  values  of  the  para- 
meters which  are  "more  probable"  than  those  determined  by  ordinary  least 
squares.  It  Is  expected  to  be  "robust"  to  outliers  and  give  an  estimate  of 
the  probability  that  a particular  outlier  came  from  the  same  population  as 
the  other  observations.  | 

i 

I.  INTRODUCTION.  The  aim  of  the  investigations  which  led  to  this  j 

paper  was  to  find  a better  method  to  estimate  the  parameters  In  mathema- 
tical models  of  physical  phenomena.  Several  assumptions  are  Inherent  In 
such  a problem:  Two  of  them  were  essential  In  our  considerations:  j 


*Based  partially  on  work  done  in  Logistics  Executive  Development  Course, 
USA  Logistics  Management  Center,  Ft.  Lee,  VA. 
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First:  The  mathematical  model  or  models  under  test  are  completely 
specified  by  a priori  knowledge;  only  the  parameters  are  unknown.* 

Second:  The  errors  are  assumed  to  be  measurement  errors,  and 
Independent  means  are  available  (and  have  been  used)  to  determine  the  pre- 
cision of  the  measurement  devices  whose  variance  Is  given  as  oe* . These 
measurement  errors  are  assumed  to  be  Independent,  and  thus  to  form  a 
random  sequence. 

Because  of  the  first  assumption,  we  do  not  permit  ourselves  to  use  the 
established  statistical  curve-fitting  procedure  of  generalized  least  squares 
In  which  the  variance-covariance  matrix  Is  transformed  to  a diagonal  matrix. 
The  procedure  is  rejected  because  In  effect  It  changes  the  mathematical 
model  to  a different  model,  In  which  "periodic"  terms  are  added  to  account 
for  the  observed  values  of  the  autocorrelation  of  the  errors. 

Because  of  the  second  assumption,  we  must  provide  a test  as  to 
whether,  In  fact,  the  errors  remaining  after  the  parameters  have  been  es- 
timated are  consistent  with  a random  generation  of  errors  with  a variance  of 
oe*,  and  If  at  the  same  time  the  autocorrelations  observed  are  consistent  with 
a random  sequence  of  errors. 

The  last  criteria  Is  essential  from  an  experimental  point  of  view  since, 
try  as  he  may,  the  experimenter  may  not  have  succeeded  In  eliminating  all 
sources  of  bias.  To  help  him  determine  whether  he  has  done  so,  many  tests 
of  the  residuals  are  available [3, 4] . However,  these  tests  are  essentially 
go/no  go,  and  offer  no  method  to  Improve  the  estimate  of  the  parameters  by 
reducing  the  autocorrelation. 

Our  object  Is  to  provide  a data  reduction  method  which  will  give  a 
single  test  to  answer  the  question:  What  Is  the  probability  that  the  set  of 
residuals  corresponding  to  a given  set  of  parameters  ariso  by  a random  se- 
quence from  a population  with  variance  oe* . Given  this  probability,  can 
the  probability  be  increased  by  a change  in  the  parameters? 


*Most  (if  not  all)  basic  theories  of  physics  can  be  derived  from  the  least- 
square  principle.  This  principle  was  stated  by  Gauss  In  1828,  and  has 
, recently  been  confirmed  by  Moore  [2] . Because  of  this  fact,  it  would  be 

Inappropriate  to  add  additional  terms  to  the  physical  theory  merely  to 
ji  reduce  the  autocorrelation. 
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II.  CONSIDERATION  OF  CRITERIA.  In  considering  what  statistical 
criteria  could  or  should  be  used  for  our  purpose,  several  well-known 
criteria  such  as  "run"  probability,  error  normality,  etc.  as  considered  by 
Anscombe(4],  were  proposed  but  wore  rejected  either  because  a given 
test  was  not  expressible  easily  In  terms  of  the  residuals  and  thus  In  terms 
of  the  parameters,  or  it  was  not  directly  applicable  to  the  question  of 
Interest. 

Evidently  some  form  of  chi-square  tests  would  be  desirable  In  view 
of  the  well-known  fact  that  the  sum  of  squares  of  the  residuals  follow  a chi 
square  distribution.  The  variance,  covariance  matrix  (Vc~l)  was  con- 
sidered as  a candidate  by  using  the  following,  (where  (III. . .)  Is  a column 
vector  amd  (III...)1'  is  Its  transpose)  (see  Altken  [5]) . 

(III...)' V -»  (III...)  =n  £ a*  (1) 

c IJ  U 

The  expected  value  of  this  expression  Is  just  no1 . Because  o'  where  I / j 
can  be  either  negative  or  positive  and  because  of  the  tendency  for  alternating 
positive  and  negative  values  in  some  cases,  this  expression  was  found  to  be 
unsatisfactory  for  a chi  square  test. 

The  next  criteria  which  could  be  used  Is  (III, ..) ' ( V ~ 1 ) * (III...) 
which  equals  n (I  + £ r.)1  a*  whose  expected  value  Is 

If  one  should  expand  this  square,  on  the  assumption  of  r(  being  not 
correlated  with  r . (consistent  with  our  second  assumption)  one  might  expect 
the  sum  of  the  cross  product  terms  to  vanish  leaving  only  the  sum  of  the 
squares.  If  this  Is  the  case,  then  the  sum  of  the  squares  criteria  (an  alter- 
nate which  follows)  should  be  a more  sensitive  criteria . 

A third  alternative  Is  the  combination  of  (a)  the  "F"  test  of  the 
variance  of  the  residuals  where  the  measurement  variance  oe’  Is  the  stand- 
ard against  which  the  sample  sum  of  squares  Is  compared,  and  (b)  the  Box- 
Pearcn  test  of  the  sum  of  the  squares  of  the  autocorrelation  coefficients 
divided  by  the  individual  variance  Vj  (Box  and  Pearce (6]) . 

The  chi-square  formed  by  combining  these  two  tests  is  a single  test 
of  the  joint  probability  of  a given  value  of  the  sum  of  the  squares  and  the 
corresponding  values  of  the  autocorrelation  coefficients  arising  by  chance 
from  a particular  set  of  estimates  of  the  parameters. 
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The  mathematical  process  to  find  the  parameters  which  maximize  the 
probability  that  both  the  "F"  test  and  the  autocorrelation  test  are  satisfied 
will  be  called  the  "least  chi-square  method. " Its  derivation  follows. 

ill.  LEAST  CHI  PROCEDURE.  In  this  derivation  we  will  follow  the 
procedure  and  most  of  the  notation  of  Altken[5]  for  generalized  least  squares: 

Let  the  representation  of  the  vector  of  data: 

u«  {ufr}),  u(xa),  ...  u(xn)}  (2) 

by  the  vector: 

y=  <y (xj ) , y(x,),  ...  y (xn) } (3) 

be  linear  In  terms  of  a set  of  assumed  functions 

Pi  (x) . Pi(x),  ....  p^fx).  (*») 

These  functions  are  restricted  only  by  the  condition  that  they  must  be 
linearly  Independent  over  the  n values  of  x. 

If  we  let  P be  the  matrix  of  these  functions,  the  ith  row  of  P is  the 
row  vector . 

P,  * IPi  (x,) , p,  (x() ....  pk+j  (x,)  J . (5) 

In  this  event,  p is  of  the  order  of  n x (k+l) . 

Let  0*  denote  a column  vector  of  k+l  coefficients  independent  of  x, 
such  that 

0*=  {0!*,  0,*,  0,*,  ...  0*+,}  (6) 

(The  asterisk  symbol  * will  be  used  to  indicate  an  estimate  of  the  Indicated 
symbol  where  convenient.  However,  It  will  not  be  used  on  complex  expres- 
sions Involving  Xy*  • o*  * and  t because  of  typographical  difficulties) . By 
definition  then  the  vector  y Is  P0+  and  we  let  the  vector  d be: 

d = u-y  = u-  P0*.  (7) 


I 


fl 


If  %T*  Is  defined  In  the  first  way  considered,  l.e.,  (d)'(d)/oe*  plus  the 


covariance  normalized  to  oQ' , It  Is 


d'd  + o -*  { £d'  (V  •1)*dl  - (d'd)4} 

6 6 C 


(8) 


and  let 

(Vc“1)t  = [d  (I  + ! , V~l)  d']* 

In  this  expression  V .-1  is  defined  as  follows: 


r * 

, m 

1 V = 

0 1 0 ...  0 

; v,-‘  = 

0 0 1 0 ...  0 

0 0 10.  .0 

0 0 0 1 

r 

V 

00010. 0 

0 0 0 

% 

f 

* • ■ 

| 

f 

• 

i v.-1 » 

0 0.  . .01. 

, , 

' j 

0 0 0 

0 1... 

V 

• • * 

•m 

(10) 


In  these,  the  subscript  "jn  indicates  a unit  value  In  each  of  the  Ith  rows  and 
(I  + J)th  column.  Thus  equation  (8)  becomes 


X*  {d'd  + o -•d'ld'ttlv.-1  +!  V-*)d)d) 

i e e }n|  J |*|  ) 


J»l 


(ID 


The  partial!  of  equation  (11)  with  respect  to  9r  are  clearly  a complex  ex- 
pression, when  compared  to  the  method  which  follows  this  discussion,  so 
that  further  analysis  is  not  presented. 


If  the  chi-square  Is  taken  as  the  final  alternative,  and  V.  Is  the 
variance  of  r,*  then:  ‘ 


) 


XT*  * V*  d,d  + fal  r\,V\ 


(12) 
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Following  the  generalized  principle  of  least  squares,  the  partlals  of  xT */2 


axT*/2 


-~T-  a V*  { (P'  r P0*J  " p'  r “ 0 


where  r,  a.  are  defined  In  terms  of  the  unit  vector  I and  the  factors  d,  r.# 
and  V,-1  as  follows.  Let 

2r  V -l 
II 

a « 1 

J (d)'(d)/o  » -2f  , (r,)1  V-1 
e |a|  I I 


T-l  + E a.r.V."1 
1=1  Ml 


Solving  for  8*,  we  find 


e*  ■ [p*  r p]-1  p'  r u. 


Since  r depends  on  the  values  of  r.  and  d'd,  which  again  depends  on  0*, 
the  values  of  6*  must  be  determined  Iteratively,  with  each  Iteration  being 
used  to  determine  r^  and  d'd  until  the  values  converge. 

IV.  LEAST  CHI  SQUARE  FOR  FUNCTIONS  WHICH  ARE  NOT  LINEAR  IN 
THE  PARAMETERS.  In  the  previous  paper  [1  ] the  expression  for  a new 
estimate  of  the  parameters  has  been  derived  for  "the  least  chi  square"  pro- 
cedure. That  derivation  will  be  understood  by  the  present  notation  as 
follows: 


Let  y(*  ■ y(xj,  e*),  y(x„  0*)  . . . yfx^,  0*) 


and  let  Uj*  ■ U|  - y|*  . 


Define  the  matrix  P*  as  the  matrix  whose  Ith  row  Is 
dyl*  0yl*  dy|* 

aST  9®T  80  ’ 


Let  {d*>  = P*  [80*]  - u*. 


(18) 


From  this  It  is  clear  that  d*,  P*,  60*,  and  u*  may  be  substituted  for 
d,  P,  8*,  and  u in  the  formula  for  0*  so  that 


[60*]  » [P*'  r P*]~l  P*'  r u*. 


(19) 


V.  EXPLICIT  EXPRESSION  FOR  POLYNOMIAL  LEAST-CHI  SQUARE. 


Equation  (11)  can  be  axplicltely  expressed  in  terms  of  x.,  u,,  and  a,  If 


p.(x)  are  polynomials.  For  computing  purposes  this  may  be  desirable 


since  the  various  "moments"  can  be  evaluated  from  the  data  (u.)  and  from 


the  values  of  the  independent  variable  x.  In  several  ways. 


To  calculate  the  matrix  elements  expllcitely,  let  the  value  of  r be 

From  this  expression  the  matrix  elements  of  the 


I + 2 X aj  as  In  (15). 
equation 


p'ru  ■ p'rpe* 


are  calculated  and  the  results  are  given  In  Figure  (1) . (Note  that  In  Fig  (1) 
y Is  used  as  the  vector  of  the  observed  data  Instead  of  u(  es  previously.) 
Tne  matrix  terms  Include  the  ordinary  least  square  terms  plus  the  added 
terms  as  may  be  seen  by  Inspection  of  each  term.  The  added  terms  can  be 
distinguished  from  the  ordinary  terms  by  the  fact  that  each  of  the  added 
terms  are  proportional  to  a . The  calculation  of  the  "moments"  can  be  done 
In  a variety  of  ways.  Assuming  that  X|  are  equally  spaced  Integers,  two 
different  approaches  have  been  used  to  program  a Texas  Instrument  pro- 
grammable computer  (SR-52) . These  were: 


(a)  calculation  and  storage  of  all  the  "moments,"  calculation  of  x* 
and  ap  from  assumed  values  of  0*  followed  by  calculation  of  the  matrix  ele- 
ments, and  concluding  with  a new  estimate  of  0*  by  a standard  ordinary 
least  square  program  routine  such  as  the  Texas  instrument  "Trend  Analysis 
Program. " This  program  calculates  the  new  values  of  0(*  by  the  usual 
techniques  of  solution  of  simultaneous  linear  equations. 


(b)  A second  way  Is  to  calculate  the  residuals  d from  an  Initial  esti- 


mate of  0|*.  From  them  calculate  (d) ' (d)  and  (d) ' Vj_l  (d) . From  these  two, 
xT*  and  cip  are  calculated;  followed  by  the  matrix  elements  of  the  trend 
analysis  program  and  then  the  values  of  0*. 
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VI.  EXAMPLES.  The  first  to  be  discussed  uses  the  data  on  "national  paper- 
board  production  per  quarter"  given  by  Butler,  Kanesh,  and  Platt[7].  This 
case  illustrates  the  situation  where  serial  correlation  due  to  seasonal  effects 
is  present,  and  offers  a comparison  between  ordinary  least  squares,  and 
least  chi  square.  The  second  case  uses  data  on  the  gross  national  product 
(8} , In  a case  where  the  "eyeball  test"  Indicates  that  a linear  least  squares 
Is  not  adequate.  The  purpose  of  the  study  of  this  case  Is  to  provide  a case 
where  a priori  one  would  not  expect  a good  fit. 

In  all  cases,  30  data  points  were  used.  The  variance  of  the  autocor- 
relation squared  was  taken  as  approximately  (n-4)"1 ,'  and  the  expected  value 
for  V2xt>  was  assumed  to  be  (2(n+s-q)  where  n is  30,  s Is  3,  and  q is  2. 
The  validity  of  this  formula  as  compared  with  alternates  such  as  one  where 
the  degrees  of  freedom  are  n+s-2q  Is  not  Important  for  these  cases. 

Table  1 shows  the  results  of  the  calculation.  For  each  case,  as  de- 
signated under  the  "DATA  SOURCE"  column,  values  were  ostlmated  for  the 
variance  of  the  measurement  error.  Under  "Initial"  and  "final"  columns 
are  given  the  estimates  of  90  , 6, , Xi 1 . Xi*  and  x* . Using  the  final  values 
of  2xt>,  an  estimate  of  its  deviation  (A)  In  multiples  of  the  standard  devia- 
tion from  the  expected  value  E(>/2x^7)  Is  obtained. 

The  first  case  of  Cross  National  Product  (fig.  2)  used  a straight  line 
fitted  by  eye  to  the  data.  The  second  case  used  ordinary  least  squares  as 
the  Initial  estimate.  The  ordinary  least  squares  gave  the  same  final  esti- 
mate of  the  parameters  after  one  iteration  as  did  the  Initial  "eye  ball"  fit 
did  after  two  iterations.  (There  was  no  change  between  the  second  and 
the  third.)  The  amount  of  calculation  would  be  somewhat  less  with  the 
ordinary  least  squares  as  the  Initial  point.  The  eyeball  fit  was  used  to 
check  the  ability  of  the  program  to  converge  when  given  an  initial  con- 
dition which  was  not  the  "best"  estimate. 

The  third  case  of  the  CNP  used  a value  of  the  estimate  of  the 
measurement  variance  of  the  CNP  as  four  times  that  Initially  estimated. 
The  same  initial  "eyeball"  estimate  was  used  as  before  and  a rapid  itera- 
tion to  nearly  the  same  final  values  of  the  parameter  resulted.  The 
large  value  of  Xi*  nearly  always  dominated  the  value  of  Xi* . 

The  Initial  estimate  on  the  "Paperboard  Production"  was  taken  from 
the  result  given  by  the  authors  using  many  more  data  points.  This  as- 
tlmate  was:  6|  is  3671.8  and  dt  Is  74.12.  A change  of  variables  was 
made  for  convenience  as  follows: 

y'  “ .2y  - 760  (20) 
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Using  this  value  as  "normalized"  production  the  Initial  and  final  values 
are  given  In  table  1 . In  terms  of  the  original  parameters  the  estimates 
of  e0,  and  81  are  3712.5  and  72.25  respectively  for  case  A.  For  case  B 
they  are  3715.0  and  72.15. 

In  case  A,  the  Initial  estimate  of  the  parameters  was  changed  by 
the  iteration  so  that  the  least  squares  error  became  smaller  and  the  sum 
of  the  squares  of  the  autocorrelation  coefficients  became  larger.  The 
final  parameters  of  case  A were  used  as  the  Initial  estimate  for  case  B, 
but  the  estimated  variance  of  measurement  was  increased  by  a factor  of 
ten.  The  iteration  procedure  produced  a change  In  the  final  value  such 
that  the  sum  of  the  squares  (xa * ) was  slightly  increased,  but  the  auto- 
correlation decreased.  This  is  the  only  case  studied  where  "A"  Is  less 
than  one  standard  error. 

The  reasons  for  the  large  values  of  A,  follow  for  each  case:  For 
the  CNP  cases,  the  linear  model  Is  obviously  Insufficient  to  fit  the  data. 
Making  allowance  for  a larger  estimate  of  the  measurement  error  does 
not  compensate  for  the  correlation  of  the  residuals.  We  conclude:  the 
CNP  case  does  not  satisfactorily  fit  a linear  model  as  assumed. 

For  the  Paperboard  Production  cases  the  "measurement"  variance 
is  larger  than  100  but  probably  less  than  1,000.  Because  there  has 
been  no  attempt  in  the  present  study  to  adjust  for  "seasonal"  fluctuations 
which  may  be  real,  the  "seasonal"  fluctuation  then  represents  an  addi- 
tional (and  correlated)  error  In  each  quarterly  estimate.  Further  analysis 
will  be  done  for  this  case  when  a computer  of  larger  storage  capacity 
than  the  one  used  In  this  study  Is  available. 

To  investigate  the  possibility  that  outlier  rejection  would  be  as- 
sisted by  this  technique  the  25th  data  point  was  chosen  by  Monte-Carlo 
techniques  and  a -3oe  deviation  from  the  original  fitted  line  was  Intro- 
duced. Two  cases  were  calculated  using  this  set  of  data:  in  the  first 
case  ordinary  least  squares  was  used  to  initiate  the  calculations;  in  the 
second  case  an  initial  estimate  of  the  values  of  the  parameters  near  to 
the  fitted  line  of  the  unmodified  set  of  da'ta  was  used. 

The  final  parameter  estimates  agreed  in  both  cases  and  the  values 
of  both  Xi*  and  %i*  greatly  Increased.  The  result  Is  that  "A"  became 
greater  than  2.3  as  compared  to  the  previous  result  of  0.68.  (The 
variance  of  V2Xj*  Is  unity.) 
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I Thus  we  find  this  test  sensitive  to  a single  outlier  and  Indicates  j 

j that  further  study  should  be  done  of  this  technique.  i 

i Questions  such  as  the  relation  to  the  ARMA  technique  (9)  have 

j not  yet  been  Investigated.  ' 

\ ; J 

i!  VII.  SUMMARY.  It  was  observed  that  the  fit  criteria,  x*,  was 

S ' Improved  in  each  case  from  the  ordinary  least  square  value  by  the  itera- 

tion procedure.  In  this  process  the  chi  square  of  the  autocorrelation  co- 
1 efficients  was  always  reduced  from  that  which  occurred  at  minimum 

variance  of  the  errors  at  the  expense  of  permitting  a slight  increase  in 
: i the  variance  of  the  errors.  i 

I 

Based  on  this  result  and  on  the  theory  of  the  tests,  least  chi 
square  gives  an  improved  estimate  of  the  parameters  as  compared  to 
ordinary  least  squares. 

The  convergence  was  rapid.  The  number  of  Iterations  required 
to  converge  was  approximately  three.  It  Is  yet  to  be  demonstrated  that 
an  "eyeball"  Initial  fit  might  reduce  the  number  of  Iterations  required 
but  it  is  believed  likely. 

When  performing  experiments  Involving  measurements,  the  measure- 
ment error  should  be  Independently  observed  so  that  data  will  be  avail- 
able to  apply  the  least-chl  square  test  If  appropriate. 
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SIMPLIFIED  CONSTRUCTION  OF  BASIS  FUNCTIONS  FOR  POLYNOMIAL  SPLINES 

J.  J.  Heimbold 

MARK  Resources,  Inc.,  Marine  del  Rey,  Cillfornle 

A simple,  straightforward  procedure  ie  presented  for  generating  poly- 
nomials  over  a set  of  contiguous  intervale.  The  polynomials  can  be  constructed 
to  be  continuous  or  to  have  an  arbitrary  number  of  derivatives  continuous 
scross  the  interval  boundaries  (knots).  The  constructed  functions  era 
ordinary  polynomial  splines  of  given  degree  with  any  specified  number  of 
derivatives  continuous  across  the  boundaries. 

1.  minimum  mean-square  error  criterion  in  fitting  the  spline  polynomials 
to  a set  of  date  points  requires  solving  a set  of  linear  equations.  In 
actual  applications  it  is  efficient  to  express  the  polynomial  splines  as 
a sat  of  basis  functions, which  simplifies  the  solution  of  the  linear  equations. 
A set  of  spline  basis  functions  is  presented  which  does  simplify  the  solution 
to  the  minimum  mean-square  fit.  The  functions  are  created  in  auch  a way  that 
many  pairs  of  basis  functions  are  mutually  orthogonal.  In  addition  they  are 
ordered  in  a way  that  results  in  a banded  matrix  in  the  set  of  linear 
equations.  Both  of  these  properties  lead  to  a numerically  simple  solution 
and  a reasonably  small  amount  of  computer  storage. 

MOTIVATION 

The  need  to  construct  splines  grew  out  of  a requirement  to  obtain  tra- 
jectory estimates  from  noisy  radar  data.  It  was  known  that  some  of  the 
trajectories  could  be  modeled  by  fourth  to  sixth  degree  polynomlale  over 
short  time  Intervals,  end  it  was  desired  that  the  range,  velocity  and 
sometimes  the  acceleration  or  higher-order  range  derivatives  be  continuous 
scross  interval  boundaries. 
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Aa  a result  of  th«  naad  for  th*  trajectory  estimates,  a technique  was 
developed  for  constructing  spline  basis  functions  for  polynomials  of 
arbitrary  degree  with  an  arbitrary  nunbar  of  constrained  derivatives  at 
the  knots. 

The  motivation  for  deriving  the  spline  basis  functions  came  from  the 
need  to  quickly  implement  a spline  program.  A survey  of  the  spline  literature 
found  it  to  be  either  limited  to  second  or  third  degree  polynomials,  or  to 
be  unreadable  without  a specialised  background. 

CONSTRUCTION  OP  POLYNOMIAL  SPLINES 

P—l 

It  can  be  shown  that  a polynomial  spline  of  degree  D in  C (x^,“)  over 
the  set  of  strictly  increasing  knots  {x^x 

D D D 

J-0  j-P  J-P 

where 


Expressing  the  splines  in  this  form  yields  a concise  mathematical 

formulation  of  the  splines.  The  first  summation  term  is  a polynomial  of 

P-1 

degree  D on  (x^,«)  and  is  in  C The  rest  of  the  summation  terms 

are  in  (-•,»),  and  hence  Q(x)  is  in  C?,”*(x^,»). 

Tike  functions  (x-x^)^  are  basis  functions  for  Q(x),  and  are  not  necessarily 
mutually  orthogonal  for  any  two  basis  functions.  Consequently,  a direct 


x-x, 


(x-x^  ^ 


x<x. 


V 


can  be  written  as 
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computation  of  a minimum  mean-square  fit  of  Q(x)  to  noisy  data  will  require 
matrix  storage  and  inversion  for  a square  matrix  with  dimension  (N-l) (W\L-P)+P. 
A change  of  basis  functions  can  reduce  the  matrix  storage  requirement*  if  the 
basis  functions  are  chosen  such  that  many  pairs  of  the  functions  are  mutually 
orthogonal. 


SPLINE  BASIS  FUNCTIONS 

The  basis  functions  which  lead  to  a banded  matrix  are 
?+? 


JB. (n)  - < 


13  i * " * 


i+p+1 


k-1 


V 


otherwise 


where 


ri 


l<m<P+l  ^ X i+P+1  "*Xi+m-l^ 


^ik^i+P+l^i+k-P 


J"1. B&L 


n (xi+o-l"x 

l<m<P+l 

ojfk 


i+k-1) 


J**l,2, . . . ,D+1-P, 
k-1,2, . . . ,P+1,  and 


for 


J 

That  a basis  functions  span  all  tarns  of  tha  fora  ; J-P,...,D; 

1-1,..., N-l.  Tha  othar  tarns  in  tha  polynonial  Q(x) , vis.,  (x-x.)^ , 
J-0,1,... ,P-1,  arc  spsnnad  by  craatlng  P knots  • • • •*Li,X0  w*1*6*1 

ara  strictly  incraasing  with  xn  * x^.  Than  tha  aat  of  basis  functions 
(jB.(x)),  i— P+1, -P+2,..., 0,1,  Is  a sat  of  P+1  linaarly  lndapandapt  poly- 
nomials of  dagrae  P on  tha  intarval  and  hanca 

P 

5}iJ  (x"xi)+ 

j-0 

can  bs  formed  as  a linasr  conblnation  of  thasa  basis  functions. 

Thass  basis  functions  have  tha  proparty  that 

^ <x>  - 0 for  |i-J | > P+1  . 

They  can  ba  ordarad  as: 

1B-P+1  ’ l®-P+2 * “*•  1B0  * 

1B1*  2*1’  IH-1-PB1  * 

a 

a 

a 

lVr  2BH-i * ***•  d+i-pbh-i  * 

With  this  ordering,  tha  natrlx  of  dot  products  of  tha  basis  functions  is 
band  ad  with  bandwidth  (P+1)  (D-P+l) , P«0,1 D. 
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VALT  PARAMETER  IDENTIFICATION  FLIGHT  TEST 


Robart  L.  Tomaine*,  Wayna  H.  Bryant,**  and  Ward  F.  Hodge** 

NASA-Lsnglay  Raaaarch  Center 
Hampton,  Virginia  23665 


ABSTRACT 


The  Langley  Research  Canter  Is  engaged  in  a research  program  to  develop 
the  technology  to  maximise  tha  capability  of  helicopter  operation  in  con- 
fined areas.  The  program,  VALT  (VTOL  Approach  and  Landing  Technology),  uses 
an  integrated  approach  luvolving  the  helicopter,  avionics  system,  control 
system,  displsys,  and  the  pilot.  An  important  task  in  the  study  Is  to 
develop  an  accurate  model  of  the  helicopter  system  for  flight  control 
design  and  simulation  studies.  A flight  test  designed  utilising  the  VALT 
approach  profile  was  performed  at  the  NASA  Wallops  Island  test  facility  to 
obtain  data  for  verifying  exlating  mathematical  models  through  use  of  para- 
meter identification  techniques.  Briefly,  parameter  identification  as 
applied  to  flight  vehicles  consists  of  identifying  the  aerodynamic  co- 
efficients of  the  vehicle  equations  of  motion  utilizing  the  measured  vehicle 
states  and  accelerations  resulting  from  measured  control  inputB.  Theoreti- 
cally, these  coefficients  can  be  determined  very  accurately;  however,  in 
actual  applications  many  problems  and  limitations  are  encountered,  in 
addition,  the  research  vehicle  used  (CH-47)  and  the  VALT  flight  regime  intro- 
duced problems  specific  to  this  application.  The  unique  facilities  utilized 
to  minimize  these  problems  for  the  CH-47  parameter  identification  flight  teat 
included  the  CH-47  fly-by-wire  control  system  and  omboard  computer,  the 
Wallops  Test  Center  rsdar  tracking  system,  the  Langley  Research  Cantar  mobile 
research  Aircraft  Ground  Station  (RACS)  and  Piloted  Aircraft  Data  System 
(FADS) , and  the  CH-47  Sperry  flight  director  display. 

Data  runs  were  performed  to  Include  test  points  along  the  entire  VALT 
approach  trajectory,  including  straight  and  level  flight,  straight  descending 
and  ascending  flight,  and  spiral  descents.  Complete  data  sots  were  meaeurad 
at  40  spa  on  PCM  racorders  and  stored  on  board  to  include  attitudes, 
velocities,  angular  rates,  linear  accelerations,  pilot  stick  positions, 
actuator  positions,  SAS  positions,  rotor  RPM,  and  other  pertinent  information. 

In  addition  to  describing  the  details  of  this  flight  program,  preliminary 
results  of  parameter  identification  processing  utilizing  advanced  statistical 
methods  are  presented. 


♦Structures  Laboratory,  US  Army  Research  and  Technology  Laboratories  (AVRADCOM) 
♦♦National  Aeronautici  and  Space  Administration,  Langley  Research  Center 
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INTRODUCTION 


V A L T is  an  acronym  for  VTOL  Approach  and  Landing  Technology  (Ref.  1). 
It  la  a comprehensive  program  Including  flight  management,  guidance  and 
control,  and  display  technology  with  the  ultimate  goal  of  the  development 
of  avionics  technology  for  optimum  VTOL  short  haul  transportation  In  the 
1980' a time  regime.  One  Important  task  of  the  VALT  program  le  to  develop 
an  accurate  model  of  the  VALT  research  vehicle,  which  is  required  for 
guidance  and  control  system  design.  This  paper  is  concerned  with  the 
approach  taken  to  determine  this  model. 

The  method  of  obtaining  an  accurate  model  of  the  VALT  research  vehicle 
Is  verification  of  prior  developed  analytical  models  by  processing  selected 
flight  maneuver  data  with  advanced  parameter  identification  algorithms. 

The  VALT  flight  regime  consists  of  cruise,  transition  and  hover  flight 
conditions.  Anticipated  VALT  trajectories  include  straight  and  level  flight, 
straight  ascending  and  descending  flight,  and  spiral  descending  flight.  A 
comprehensive  flight  test  program  wsb  conducted  at  the  NASA  Wallops  Flight 
Test  Center  to  obtsin  data  for  all  of  the  flight  conditions  anticipated  for 
the  VALT  trajectories. 

Parameter  Identification  of  flight  vehicles  consists  of  disturbing  the 
test  aircraft  with  a known  control  input  to  produce  a response  in  the  vehicle 
states  which  are  measured  as  a function  of  time  (.see  Fig.  1).  Given  a form, 
the  vehicle  model  (plant)  and  the  measured  states,  the  algorithms  compute 
the  coefficients  (stability  and  control  derivatives)  of  the  model.  The 
equation  set  governing  the  Identification  process  is  as  follows: 

e 

X - A(p)X  + B(p)U 

where  X refers  to  the  vehicle  state  vector,  U is  the  control  input  vector, 
and  A and  B are  the  stability  and  control  matrices  which  compose  the 
assumed  plant.  The  plant  is  the  equations  of  motion  of  the  vehicle. 

The  general  identification  problem  is  complicated  by  the  presence  of 
two  primary  error  sources.  First  of  all,  the  measurements  of  the  states 
contain  noise  due  to  the  vehicle  vibration, ^instrument  limitations,  and  data 
processing.  This  results  in  the  equation  Z ■ X *t  V , where  the  meuuurement 
vector  Z is  a combination  of  the  actual  state  vector  X and  a measurement 
noise  vector  V.  In  addition,  some  of  the  response  of  the  vehicle  may  be  due 
to  external  disturbances  such  as  wind  gusts,  and  the  assumed  modal  may  not 
be  representative  of  the  actual  vehicle.  These  error  sources  in  combination 
are  referred  to  as  process  noise.  Therefore,  the  problem  in  to  determine 
the  components  of  the  A and  B matrices  of  the  assumed  plant  in  the 
presence  of  both  measurement  and  process  noise. 

In  practice,  several  specific  problems  occur  in  parameter  identification; 
and  in  this  study  additional  problems  associated  with  the  VALT  flight  regime 
and  the  VALT  research  vehicle  are  encountered.  The  general  problems  include 
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Che  presence  of  winds,  which  as  discussed  earlier  introduces  process  noise. 
Additional  problems  result  from  the  form  of  the  vehicle  equations  of  motion 
(plant)  chosen  to  represent  the  vehicle.  These  equations  are  linear  6 
degree-of-freedoa  small  perturbation  equations  chosen  for  compatibility 
with  control  system  design  procedures  and  limitations  on  existing  parameter 
identification  algorithms.  The  equations  require  obtaining  an  accurate 
and  steady  vehicle  trim  and  linear  response  in  the  vehicle  state  variables. 

The  VALT  flight  regime  introduces  the  problem  of  determining  accurate 
vehicle  velocity  measurements  at  low  airspeeds  where  conventional  pitot- 
static  Instruments  are  useless.  The  vehicle  itself  introduces  furtner 
difficulty  in  that  it  has  unstable  modes  and  its  rotorB  Introduce  high 
frequency  noise  in  the  measurement  system  and  the  possibility  of  rotor/ 
fuselage  coupling.  Lastly,  flight  testing  introduces  the  need  to  evaluate 
on  board  and  at  the  test  location  the  accuracy  and  quality  of  the  data  being 
acquired.  The  next  section  will  discuss  how  the  test  was  designed  to  minimize 
the  aforementioned  problems,  and  to  obtain  an  accurate  and  appropriate  data 
eat.  To  provide  a background  for  discussing  how  these  problems  were  handled  and 
tha  testing  approach  taken  in  these  flight  tests,  the  facilities  utilized  are 
discribed  first. 


DESCRIPTION  OF  FLIGHT  TEST  FACILITIES 


The  parameter  identification  flight  tests  were  carried  out  at  NASA's 
Wallops  Flight  Center,  Wallops  Island,  Virginia  in  March  of  1977.  The 
Wallops  facilities  crucial  to  these  flights  were  the  Aeronautical  Radar 
Research  Complex  (ARRC  radar);  the  Transponder  Data  System  (TDS) ; wind  data 
measurement  equipment,  including  a wind  measurement  tower  end  weather 
balloons;  and  the  Research  Aircraft  Ground  System  (RACS).  The  current  VALT 
research  vehicle  is  a Boeing-Vertol  CH-47  transport  helicopter  from  NASA's 
Langley  Research  Center.  Each  of  these  systems  are  briefly  described  below. 

The  ARRC  radar  facility  consists  of  an  FPS-16  radar  used  in  conjunction 
with  a laser  tracking  radar  to  provide  vehicle  position  data  accurate  to 
one  foot.  This  information  is  processed  by  a minicomputer  within  the  facility 
to  provide  highly  accurate  data  in  a Cartesian  coordinate  system  aligned  with 
the  runway  chosen  for  each  day's  flights.  The  data  is  then  telemetered  to 
the  vehicle  using  the  Transponder  Data  System  (TDS).  The  TDS  is  ■ data  link 
that  uses  the  time  between  radar  pulses  to  send  pulse  position  modulated 
(ppm)  digital  data  to  and  from  the  vehicle  on  the  same  frequencies  as  the 
ground  radar  (uplink)  and  the  airborne  transponder  (downlink).  The  data 
transmission  rate  is  one  ten-bit  digital  word  on  both  uplink  and  downlink  par 
pulse  of  the  radar,  and  for  these  tests  was  configured  to  give  approximately 
3<*  complete  position  updates  per  second  to  the  on-board  digital  computer. 

The  ARRC  radar  facility  was  also  used  to  track  weather  balloons  released 
at  regular  time  intervals  to  obtain  wind  velocity  and  direction  information 
at  100-foot  intervals  from  200  feet  to  2,500  feet.  A 100-foot  weather  data 
tower  wao  used  to  obtain  low  altitude  and  surface  wind  data. 


75 


The  HAGS  is  a mobile  station  with  a telemetry  link  to  the  aircraft 
measurement  system  as  well  as  magnetic  tape  playback  equipment  * It  provides 
the  capability  for  both  real-time  data  display  of  selected  parameters 
as  well  as  a post-flight  quick  look  capability  at  all  of  the  measured 
parameters. 

As  previously  mentioned , the  research  vehicle  is  a Boeing-Vertol  CH-47 
tandem  rotor  transport  helicopter  equipped  with  a fly-by-wire  control  system. 
The  cockpit  haB  both  a standard  mechanical  control  stick  arrangement  (the 
safety  pilot)  as  well  as  an  electrical  stick  (the  research  pilot).  The 
mechanical  control  arrangement  controls  the  position  of  the  vehicle's 
actuators.  The  electrical  stick  serves  as  input  to  the  computing  system, 
which  can  manipulate  the  signals  in  a variety  of  ways  through  programming  of 
the  Sperry  1819A  digital  flight  computer.  Outputs  from  the  Sperry  1819A  are 
converted  to  analog  signals  used  as  inputs  to  electrohydraulic  actuators. 

The  outputs  from  these  actuators  are  then  used  to  drive  the  standard 
mechanical  control  stick  arrangement  through  a clutch  arrangement,  which 
allows  rapid  disconnection  of  the  computing  system  in  the  event  a potentially 
dangerous  control  input  to  the  vehicle  is  generated. 

The  Sperry  181$A  flight  computer  is  a general  purpose,  fixed-point 
18-bit  stored  program  integer  machine  with  16,384  words  of  ferrite  core 
memory  for  program  and  data  storage.  This  computer  communicates  through 
a variety  of  interfaces  to  the  research  pilot's  control  sticks,  motion 
sensors,  the  control  system  actuators,  the  transponder  data  system,  and  its 
own  control  panels,  which  allow  data  examination  and  modi ficar. ion. 

Measurement,  recording,  and  telemetering  of  spatial,  control,  and 
discrete  variables  is  handled  by  the  Piloted  Aircraft  Data  System  (PADS), 
a pulse  code  modulated  (pem)  recording  system.  Sensor  outputB  are  first 
routed  through  buffer  amplifiers  and  then  sent  to  the  computing  Bystem, 
the  on-board  recording  system,  and  to  the  telemetry  system. 


TESTING  APPROACH 


The  first  category  of  flight  testing  problems}  accurate  knowledge  of 
the  winds,  precise  aircraft  trim,  and  low-speed  air  data  measurements  were 
handled  through  the  combined  use  of  the  ARRC  radar  facility,  the  TDS,  the 
on-board  digital  computer,  and  an  electromechanical  flight  director.  As 
described  previously,  wind  data  were  obtained  at  periodic  time  intervals 
by  releasing  and  radar  tracking  a weather  balloon.  The  subsequent  reduction 
of  the  radar  track  provided  wind  velocity  and  direction  at  regular  altitude 
Intervals . 

Accurate  low-speed  air  data  measurements  were  obtained  through  pro- 
cessing of  redar  derived  position  data  (telemetered  to  the  vehicle  using 
the  TDS)  with  on-board  acceleration  measurements  in  a complementary  filter 
implemented  in  the  Sperry  1819A  digital  computer  to  obtain  en  estimate  of 
ground  speed.  To  this  ground  speed  estimate,  the  current  wind  velocity  wee 


added  so  that  when  flying  directly  into  the  wind  an  accurate  estimate  of 
airspeed  was  obtained.  This  airspeed  determination  system  was  used  for  all 
flights  and  covered  the  range  from  hover  to  80  knots. 

Precise  trim  conditions  were  established  by  using  an  electromechanical 
flight  director,  driven  by  the  flight  computer,  indicating  deviation  from 
desired  trim.  Figure  2 is  a photograph  of  the  research  pilot's  cockpit  and 
shews,  in  addition  to  the  flight  director,  other  standard  aircraft  instru- 
ments. Starting  at  the  top  on  the  left-hand  side,  la  an  airspeed  indicator, 
a torque  meter,  and  a flight-altitude  indicator.  At  the  right,  starting  at 
the  top  is  an  altimeter,  a vertical-speed  indicator,  and  a magnetic  compass. 

The  CRT  shown  just  below  center  is  used  for  display  evaluation,  but  was 
not  used  in  these  flights.  The  flight  director  horizontal  pointer  was 
used  to  indicate  error  from  desired  airspeed;  the  vertical  bar,  error  from 
desired  sideslip;  the  doughnut  (at  the  left  side),  error  from  desired 
descent  rate;  and  the  localizer  (at  the  bottom),  error  from  desired  rate 
of  turn.  The  pilot's  task  to  obtain  precise  trim  was  to  simultaneously 
center  the  four  flight  director  pointers.  To  accomplish  this  task,  the 
pilot  first  would  obtain  an  approximate  trim  using  the  standard  aircraft 
instruments , and  then  focus  his  attention  to  centering  the  four  flight 
director  pointers.  Gains  and  damping  for  each  flight  director  pointer  were 
individually  selectable  through  the  entry  of  appropriate  constants  in  the 
Sperry  1819A  computer.  This  feature  allowed  the  flight  director  to  be 
"tuned"  to  the  pilot  to  obtain  the  most  satisfactory  overall  performance. 

The  second  category  of  problems  were  all  handled  through  the  combination 
of  control  input  design,  its  implementation  in  the  on-board  digital  computer, 
and  the  electrically-driven  control  surface  actuators.  The  basic  control 
input  design  was  carried  out  under  contract  to  NASA’b  Langley  Research  Center 
by  Systems  Control,  Inc.  of  Palo  Alto,  California.  These  designs  were  based 
on  exciting  the  Stability  Augmentation  System-on  closed-loop  modes  of  an 
analytic  model  of  the  CH-47,  and  consisted  of  a high  and  low  frequency  sinusoid. 
Figure  3 represents  a typical  control  input  generated  by  the  flight  computer 
for  the  pitch  axis  and  shows  the  two  components  of  the  designed  control  Input. 

To  strike  a balance  between  adequate  model  excitation  and  the  linearity 
constraints  on  the  vehicle  response  imposed  by  the  small  perturbation  model 
used  in  the  parameter  Identification  sequence.,  scaling  was  provided  in  the 
computer  implementation  of  the  automatic  control  inputs.  Repeatability  and 
accurate  knowledge  of  the  control  input  was  inherent  in  the  digital  computer 
implementation.  To  account  for  a known  speed  instability  at  higher  airspeeds, 
a longitudinal  stabilization  input  (also  implemented  in  the  digital  computer) 
was  added  to  the  programmed  input  to  maintain  the  resultant  vehicle  response 
within  the  small  perturbation  equations'  linearity  constraints. 

Two  major  systems  were  primarily  used  to  address  the  third  category 
of  problems,  real-time  data  evaluation.  The  Piloted  Aircraft  Data  System 
(PADS)  on  board  the  vehicle  was  used  to  both  record  a wide  selection  of 
measurements  on  magnetic  tape  and  » iso  telemeter  a *ucset  of  these  measure- 
ments to  the  Research  Aircraft  Ground  Station  (.KmGS)  lor  subsequent  real- 
time display  on  multi-channel  chart  recorders.  In  the  RAGS,  transparent 
overlays  of  the  expected  measurements,  prepared  earlier  using  the  CH-47 
analytic  model  (Ref.  2),  were  then  compared  with  the  real-time  data  lor  use 
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in  evaluating  the  success  of  n particular  run.  This  information  was  then 
relayed  to  the  research  project  engineer  on  board  the  helicopter  for  his 
use  in  determining  the  next  flight  test  point.  After  each  flight,  the 
on-board  tape  was  used  in  t1  ->  RAGS  to  create  additional  stripchart  recording 
of  measurements  that  proved  useful  in  planning  subsequent  flights. 


TEST  POINT  SEQUENCE 


Figure  4 is  a pictorial  of  NASA's  Wallops  Flight  Center  which  illustrates 
the  systems  used  by  this  series  of  flight  tests.  Each  of  these  systems  has 
been  described  earlier.  This  figure  1b  useful  in  understanding  the  sequence 
of  events  in  obtaining  flight  test  points. 

Since  the  airspeed  estimator  required  the  vehicle  to  be  flown  into  the 
wind  for  all  test  points,  the  test  sequence  naturally  divided  into  a downwind 
leg  and  an  upwind  leg.  On  the  downwind  leg,  a weather  balloon  is  released 
and  tracked  by  the  radar  to  obtain  the  requisite  wind  data.  This  data  is 
then  relayed  via  radio  to  the  research  project  engineer  on  board  the  helicopter, 
who  then  decides  what  test  points  will  be  flown,  and  establishes  the  constant 
wind  velocity  to  be  entered  into  the  digital  computer  for  airspeed  estimator 
calculation. 

On  the  upwind  leg,  the  research  project  engineer  first  selects  the  test 
point  (based  on  wind  magnitude) , then  provides  the  computer  operator  with 
his  reference  number  and  the  desired  magnitude  (in  per  cent)  of  the  computer- 
generated control  input.  When  the  computer  operator  enters  these  values, 
the  appropriate  trim  values  are  obtained  from  a look-up  table  stored  in  the 
computer,  end  trim  error  signals  are  sent  to  the  electromechanical  flight 
director.  The  research  pilot  then  obtains  the  desired  trim  using  conventional 
aircraft  instruments  to  obtain  an  approximate  trim  and  the  flight  director 
to  obtain  a more  precise  trim,  When  an  accurate  trim  is  obtained,  the 
evaluation  pilot's  electric  stick  inputs  Bre  disconnected  by  the  computer 
and  programmed  control  inputB  are  substituted.  At  the  end  of  an  individual 
data  run  (approximately  20  seconds),  the  computer  system  is  disengaged  from 
the  basic  vehicle  and  the  safety  pilot  regains  control  of  the  helicopter 
setting  up  for  the  next  data  run.  While  these  activities  are  underway,  a 
comparison  of  the  real-time  data  collected  with  the  appropriate  analytic 
model  overlay  provides  valuable  insight  into  the  success  of  the  teBt  point. 

The  results  of  this  evaluation  are  then  relayed  to  the  research  project 
engineer  on  board  to  aid  in  his  selection  of  the  next  data  point.  Typically, 
several  data  pointB  were  obtained  during  each  upwind  leg,  and  wind  infor- 
mation was  updated  during  each  downwind  leg. 


PRELIMINARY  RESULTS 


The  post-flight  processing  consists  of  converting  the  PADS  data  tapes 
to  engineering  units  and  selecting  the  best  data  setB  for  each  flight 
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condition.  Selection  is  based  upon  attained  trims  and  state  variable 
responses.  For  data  from  helicopters,  better  identification  results  have 
been  obtained  from  filtered  flight  data  measurements.  For  this  study, 
the  data  has  been  filtered  by  a zero-phase-shift  Graham  digital  filter 
(Ref.  3)  with  cutoff  and  termination  frequencies  chosen  above  any  expected 
rigid  body  modes  and  below  frequencies  associated  with  the  rotor  system. 

This  step  reduces  the  noise  content  of  the  measured  state  variables 
appreciably  and  provides  only  rigid  body  vehicle  responses.  The  data  is 
further  processed  using  a Kalman  fllter/eatimator  based  upon  the  aircraft 
kinematic  equations.  The  Kalman  filter  estimates  and  removes  the  measure- 
ment biases,  and  provides  estimates  of  the  vehicle  states  based  on 
measured  attitudes,  rates,  and  accelerations. 

After  data  reduction  and  prefiltering,  the  data  sets  are  ready  for 
parameter  Identification  processing.  This  data  will  ultimately  be  pro- 
cessed using  two  differing  advanced  algorithms  capable  of  handling  both 
measurement  and  process  noise,  and  the  results  will  be  compared.  An 
Extended  Kalman  Filter  algorithm  (Refs.  4 and  5)  will  be  used  by  USARTL 
personnel  to  identify  six  degree-of-freedom  stability  and  control  derivatives, 
and  a maximum  likelihood  algorithm  (Ref.  6)  will  be  used  by  NASA  personnel; 
and  selected  data  sets  will  be  processed  under  contract  to  SCI  (Vt.),  who 
will  also  use  a maximum  likelihood  approach. 

Some  preliminary  results  are  presently  available  from  the  Extended 
Kalman  Filter  algorithm  and  the  major  derivative  valueB  identified  are 
compared  with  existing  analytical  values  in  figure  5.  The  majority  of  the 
identified  derivatives  agree  very  well  with  the  analytically-predicted 
values.  These  results  are  encouraging  since  the  responses  generated  by  the 
analytical  values  produced  responses  very  close  to  those  measured  in  flight. 
Figure  6 shows  the  eigenvalues  (characteristic  roots)  for  both  the  identified 
and  analytical  derivatives.  Good  agreement  between  analytical  and  identified 
results  are  shown  with  all  the  basic  vehicle  modes  represented,  including 
the  expected  unstable  Dutch  roll  mode  and  speed  instability.  The  results 
presented  are  preliminary,  and  many  data  sets  remain  to  be  processed.  Final 
acceptance  of  the  derivatives  will  be  based  upon  a combination  of  tests; 
including  comparison  with  analytical  values  and  expected  values  based  on 
engineering  Judgment,  responses  generated  by  identified  derivatives 
(regeneration),  responses  generated  by  identified  derivatives  for  data  not 
used  in  the  identification  process  (simulation),  derivative  uncertainties 
and  convergence  characteristics,  and  comparison  of  eigenvalues  (roots) 
computed  using  identified  derivatives  with  analytical  results  and  engineering 
Judgment. 


CONCLUDING  REMARKS 


i A specialized  flight  test  was  designed  and  Implemented  to  provide  data 

acceptable  for  parameter  identification  for  an  unstable  rotorcraft  operating 
, in  the  presence  of  winds  at  flight  conditions  from  hover  through  transition 
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to  cruise.  General  problems  in  parameter  identification , flight  testing, 
and  problems  specific  to  this  flight  test  were  considered;  and  a unique 
test  procedure  utilizing  existing  facilities  was  performed.  Preliminary 
data  processing  has  resulted  in  identified  parameters  which  agree  well 
with  existing  analytical  results. 
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Evaluation  Pilot's  Cockpit  Instruments 


PRELIMINARY  RESULTS 
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-.0398 

Figure  5,  Derivative  Comparison 
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EXPERIMENTAL  DESIGNS  FOR  SENSITIVITY  EXPERIMENTS  OF  j 

COMPUTER  SIMULATION  MODELS  j 

Carl  B.  Bates  i 

US  Army  Concepts  Analysis  Agency  ; 

Bethesda,  Maryland 

ABSTRACT.  Large  stochastic  computer  simulation  models  usually  | 

have  a large  number  of  Input  variables.  After  model  development 
and/or  before  the  model  Is  used  for  production  runs  or  used  In  a . 

particular  study»  sensitivity  testing  of  Input  variables  Is  usu- 
ally required.  Because  of  the  size  of  the  model  and  the  Intended 
future  use  of  the  model,  the  list  of  Input  variables  desired  to  be 
tested  Is  invariably  long.  Also,  because  of  the  absence  of  a prl-  . 

orl  Information  on  the  Interaction  of  Input  variables,  the  experi- 
mental design  for  the  sensitivity  experiment  must  provide  for  the  ! 

testing  of  main  effects  and  first-order  Interactions.  The  appli- 
cation of  fractional  factorial  designs  In  sensitivity  testing  is  , 

i 

i 

Illustrated  and  their  shortcomings  for  sensitivity  testing  of 
large  computer  simulation  models  Is  discussed. 
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1.  INTRODUCTION 


The  US  Army  Concepts  Analysis  Agency  (USACAA)  Is  a staff  sup- 
port agency  under  the  Deputy  Chief  of  Staff  for  Operations  and 
Plans  (DCSOPS).  The  agency's  mission  Is  to  conduct  mid-  and  long- 
range  force  concept  studies  to  establish  the  framework  and  guid- 
ance for  development  of  doctrine,  organizations,  and  materiel  re- 
quirements for  Army  forces.  Agency  studies  and  analyses  support 
Department  of  the  Army  planning  and  programing  and  provide  the 
basis  for  materiel  acquisition.  The  Agency  develops,  within  re- 
source constraints,  the  most  effective  force  structure  and  weapon 
and/or  system  mix.  The  primary  tool  for  the  performance  of  the 
studies  Is  computer  simulation  models.  After  computer  simulation 
model  development  and/or  before  a model  Is  used  In  a particular 
study,  sensitivity  testing  of  Input  variables  Is  usually  required. 
That  Is,  If  no  a priori  knowledge  exists  concerning  model  sensi- 
tivity, an  investigation  must  be  made  of  the  sensitivity  of  se- 
lected output  variables  to  changes  In  Input  variables.  This  Is 
necessary  In  order  to  evaluate  model  performance  or  to  assess  the 
ability  of  models  satisfying  specific  study  requirements. 

The  models  range  from  high  resolution,  low  (division)  level, 
to  low  resolution,  high  (theater)  level  models.  A commonality, 
however,  of  all  the  models  Is  their  size  and  complexity.  All  of 
the  simulation  models  are  large  and  very  complex.  The  number  of 
Input  variables  Is  In  the  hundreds  and  the  number  of  Input  data  Is 
In  the  thousands. 


( 


2.  PROBLEM  DESCRIPTION 


Statisticians  at  CAA  are  within  a service  support  Director- 
ate. They  provide  experimental  design  and  statistical  analysis 
support  to  all  study  Directorates  within  the  agency.  Analysts  who 
are  study  team  members  and  who  have  responsibility  for  model  sen- 
sitivity testing  of  a particular  model  come  to  the  statisticians 
with  experimental  design  problems. 

Invariably,  the  list  of  Input  variables  which  are  desired  to 
be  Investigated  Is  In  the  order  of  50  to  100  variables.  One  case 
Involved  350  variables.  Naturally,  time  constraints  never  permit 
a thorough  Investigation  of  all  variables  on  the  original  "laundry 
list"  of  variables.  Because  no  a priori  Information  exists,  the 
minimum  objective  of  the  sensitivity  testing  Is  to  test  and  esti- 
mate main  effects  and  first-order  Interactions. 

The  list  of  candidate  variables  for  testing  are  those  suspect 
of  being  significant.  The  small  subset  of  Input  variables  ulti- 
mately tested  are  those  most  strongly  suspected  of  being  highly 
significant.  That  Is,  the  variables  eventually  tested  are  antici- 
pated and  expected  to  significantly  Influence  model  output.  Past 
experience  In  model  sensitivity  testing  has  shown  that,  In  gen- 
eral, most  Input  variables  ultimately  tested  are,  In  fact,  sig- 
nificant. Moreover,  most  of  the  first-order  Interactions  are  also 
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significant.  With  study  and  hindsight,  it  is  generally  agreed 
that  this  Is  consistent  with  reality.  That  is,  the  simulation 
model  does  adequately  portray  the  real  world  which  does,  In  fact, 
consist  of  many  interacting  parts  or  components. 

3.  EXAMPLE  PROBLEM 

A recent  experimental  design  problem  Involved  a sensor  model. 
It  had  been  decided  that  three  levels  would  be  Investigated  for 
each  Input  factor  considered.  The  pessimistic  estimate  of  the 
number  of  model  runs  was  100,  and  the  optimistic  estimate  was  250 
runs.  All  Input  factors  under  consideration  were  completely 
crossed.  Therefore,  a factorial  experiment  in  a completely  rando- 
mized design  was  appropriate  for  the  computer  simulation  model 
sensitivity  experiment.  Two  designs  were  ultimately  developed, 
one  requiring  approximately  100  runs  and  the  other  requiring  ap- 
proximately 250  runs. 

A (1/9)  x 37  fractional  factorial  experiment  requiring  243 
model  runs  was  designed  using  I ■ ABODE  » CD^EF^  as  the  defining 
contrast.  The  design,  plan  9.7.9  in  Connor  and  Zelen  (1959),  per- 
mits estimation  of  the  7 main  effects  and  the  (\)  ■ 21  first- 
order  interaction  effects.  The  ANOYA  table  is  given  below. 
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Table  1.  ANOVA  for  the  (1/9)  x 3?  Design 


Source  OF 

7 main  effects  14 
21  first-order  interactions  84 
residual  144 
total  242 


A smaller  fractional  factorial  experiment  was  then  designed 
such  that  its  design  points  were  a subset  of  the  design  points  of 
the  above  seven  factor  experiment.  This  was  accomplished  by  using 
the  aliases  from  the  (1/9)  x 37  fractional  factorial  to  determine 
the  five  factors  having  a full  design  within  the  243  design 
points.  The  factors  were  B,C,D,E,  and  G.  Then,  using  I - BCDEG 
as  the  defining  contrast  gave  a (1/3)  x 3$  fractional  factorial 
experiment  requiring  81  model  runs.  The  ANOVA  table  for  the  five 
factor  experiment  Is  given  below. 

Table  2.  ANOVA  for  the  (1/3)  x 3®  Design 


Source  DF 

5 main  effects  10 
10  first-order  interactions  40 
residual  30 
total  80 
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The  design  points  of  both  designs  wore  provided  to  the  ana- 
lysts responsible  for  exercising  the  sensor  model.  The  81  factor 
level  combinations  of  the  (1/3)  x 35  design  were  run  first.  Ex- 
periment execution  proceeded  smoothly  and  the  remaining  162  runs 
for  the  seven  factor  fractional  factorial  were  also  run.  ihe 
analysis  of  Table  1 was  performed  on  each  of  a number  of  output 
variables  selected  during  the  design  phase  of  the  simulation  model 
sensitivity  experiment. 

4.  CONCLUSIONS 

Sensitivity  experiments  of  large  complex  computer  simulation 
models  Involve  a large  number  of  Input  factors.  The  number  of 
Input  factors  normally  Involved  far  exceeds  the  number  of  factors 
Involved  In  past  field  and  laboratory  experiments.  A priori  In- 
formation concerning  Interactions  among  the  Input  factors  almost 
never  exists.  Minimum  experimental  objectives  are,  therefore, 
that  the  design  permits  the  estimation  of  main  and  first-order 
Interaction  effects.  Input  factors  selected  for  testing  are  those 
suspected  of  being  highly  significant.  Past  experience  has  shown 
that  most  main  effects  and  first-order  Interaction  effects  are,  In 
fact,  statistically  significant. 

Fractional  factorials  for  2n  and  3n  designs,  developed  by 
Finney  (1945)  and  (1946)  and  available  In  Cochran  and  Cox  (1957), 
Connor  and  Zelen  (1957)  and  (1959),  and  Davies  (1960)  do  provide 
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designs  which  may  be  applied  to  sensitivity  testing  of  computer  ! 

! 

simulation  models.  However,  the  largest  2n  design  In  Connor  and  ! 

i 

Zelen  (1957)  which  yields  estimable  first-order  interactions  Is  j 

for  15  factors.  The  design  has  256  design  points.  The  largest  3n  ! 

design  In  Connor  and  Zelen  (1959)  which  gives  estimable  first-  j 

order  Interactions  Is  for  10  factors  and  It  has  243  design  points. 

The  large  number  of  computer  simulation  model  runs  required  by  j 

fractional  factorial  designs  do  not  normally  permit  assessment  of  j 

the  number  of  Input  factors  desired  when  performing  sensitivity 
experiments  of  large  computer  simulation  models.  Designs  contain- 
ing less  design  points  than  fractional  factorial  designs  but  per- 
mit the  testing  of  main  and  first-order  Interaction  effects  are 

1 

needed.  Tabulations  and  catalogs  of  designs  and/or  computer  soft-  \ 

ware  for  generation  of  the  designs  are  also  needed.  Analysis 

3 

methodology  as  well  as  fast  and  efficient  software  for  performing  j 

the  statistical  analysis  dictated  by  the  designs  are  naturally  ] 
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required.  ; 
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Abstract 


The  research  reported  here  focused  on  an  ARM/CM  field  data 
analysis,  and  models'  fitting  using  time-series  techniques.  The 
immediate  objective  is  to  build,  for  the  field  data,  an  adequate 
model  that  fits  a noiae  signal  corrupting  a deterministic  one.  The 
data  happened  to  be  seasonal  and  nonstationary.  The  ultimate  goal, 
however,  is  to  use  the  generated  model  in  updating  an  all-digital 
computer  simulation  model,  and  be  able  to  use  simulation-data  and 
field-data  in  validating  tha  model.  Few  computer  programs  have  bean 
developed  to  help  in  the  data  analysis,  the  fitting  and  checking  the 
adequacy  of  selected  models.  The  fitted  model  is  of  the  integrated 
autoregressive  moving-average  type. 
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STATISTICAL  VALIDATION  OF  J 

GUIDED  PROJBCTILE/MISSILE  SIMULATION  MODELS  j 

Harold  L.  Faatrlck  1 

Guidance  and  Control  Directorate 
Technology  Laboratory 

US  Army  Missile  Research  and  Development  Command 
Redstone  Arsenal,  Alabama  35809 

ABSTRACT . This  paper  discusses  the  statistical  analysis  tfilch  is 
proposed  £or  aiding  in  the  validation  of  several  Laser  Designator/Weapon  j 

System  Simulation  models.  The  primary  objective  is  to  provide  a means  j 

for  insuring  that  simulation  responses  to  input  signals  match  hardware  ; 

responses  under  similar  driving  conditions  to  some  "goodness-of-fit" 
criteria.  The  method  Involves  generating  several  statistics  on  the  | 

point  by  point  differences  between  the  "true"  data  and  the  simulation  j 

data.  These  statistics  include  subinterval  mean  errors,  confidence  i 

bounds  for  those  error b,  Theil's  Inequality  Coefficient,  and  the  cumula-  ! 

tive  mean  error.  j 

i 

I.  BACKGROUND . Simulations  of  guided  projectile  and  missile  ays-  j 

terns  are  used  for  a variety  of  purposes  including  flight  stability  analy- 
ses, trajectory  studies,  and  lethality  predictions.  The  computer  simula- 
tion of  these  systems  in  many  ways  predicts  the  results  that  may  be  j 

obtained  only  by  actual  flight  tests  or  enhances  analyses  already  ganer-  I 

ated  by  flight  data.  Thu  potential  for  significant  cost  savings  by  using  j 

simulations  vis-a-vis  flight  tests  creates  a firm  case  for  making  many  ! 

program  judgements  based  on  simulation  data  with  the  understanding  that 
they  are  truly  representative  of  the  raal  world.  The  general  skepticism  j 

that  program  managers  and  decision  makers  previously  placed  on  simulation 
data  is  slowly  being  replaced  by  their  belief  in  simulation  results  given  ; 

that  a quantitative  match,  to  some  level  of  confidence,  can  be  established  i 

between  hardware  and  simulation  models.  | 

i 

Recently  a computer  program  entitled,  "Laser  Designatpr/Weapon  Sys-  ■ 

tern  Simulation"  (LDW5S)  was  generated  to  enable  program  managers  for  ) 

COPPERHEAD,  HELLFIRE,  and  Ground  Laser  Designators  as  well  as  Army  policy  1 

makers  to  Judge  alternatives  among  those  systems,  A significant  objec-  ’ 

tive  in  the  LDVJSS  chronology  is  to  validate  the  projectils/misslle  char-  j 

acterlstics  modeled  in  the  software.  The  approach  is  being  directed 
toward  generating  simulation  responses  under  specified  input  conditions  j 

that  match  some  level  of  goodness -o£-f it  to  the  actual  hardware.  LDWSS  J 

la  the  product  of  an  evolution  of  simulations  of  semiactive  laser  guided  ! 

missiles  tdilch  had  been  developed  by  US  Army  Missile  Research  and  Develop- 
ment Command  (MIRADCOM)  Technology  Laboratory.  Modeling  formats  and  com- 
puter executive  structures  which  had  been  proven  in  prior  missile  simula- 
tions were  used  as  the  base  from  which  LDWSS  was  built  [ 1-4] . 


The  one-on-one  engagement  scenario  employed  a fixed  foreground  false 
target  and  a randomly  selected  background  or  overspill  target.  The  ran- 
domly selected  distance  between  the  tank  target  and  the  background  falBe 
target  was  based  upon  a statistical  representation  of  this  parameter 
obtained  for  certain  observation  posts  in  a digitised  terrain  model. 

All  energy  returns  were  subjected  to  appropriate  geometric  and  atmospheric 
attenuation  to  determine  the  reflected  energy  received  at  the  seeker. 
Utilizing  seeker  false  target  rejection  logic  to  select  the  return  to 
be  tracked  (tank  or  false  target),  the  selected  track  point  for  each 
pulse  was  used  as  input  to  the  appropriate  dynamic  model  of  seeker  and 
delivery  system.  An  overview  of  the  organization  of  LDHSS  and  associated 
data  relative  to  simulation  elements  has  several  features.  The  executive 
structure  is  designed  to  preserve  a great  deal  of  the  Internal  system 
operation  information  which  is  generated  during  the  calculation  of  hit 
probabilities  [5], 

II.  DATA  BASE.  A simulation  was  developed  to  generate  a set  of 
meaningful  statistics  which  aided  in  the  validation  of  several  of  the 
models  used  in  LDWSS  [6].  The  models  examined  Included  HELLFIRE  and 
COPPERHEAD  components.  In  general,  model  validation  was  accomplished  by 
comparing  "real"  data  with  that  generated  by  the  appropriate  LDHSS  sub- 
routines (under  Identical  input  conditions) . The  real  data  came  from 
either  field  experiments  or  the  hardware-in-the-loop  simulation.  In  any 
case,  two  sets  of  data  were  generated,  They  are  referred  to  as  actual 
(real  world  data)  and  simulation  data  (LDHSS).  Figure  1 la  a sample  plot 
of  these  data.  In  actuality,  both  curves  are  generated  from  digital 
simulations  of  an  actuator.  The  outputs  shown  are  time  response  curves 
to  a step  function  input.  Figure  2 is  a plot  of  point  by  point  differ- 
ence (actual  - simulation)  between  the  two  curves.  The  more  closely  the 
two  curves  are  alike,  the  smaller  the  residuals.  These  residuals  form 
the  basis  for  the  statistical  analysis  programs. 

The  derivation  of  the  statistics  used  for  verification  has  already 
been  covered  in  detail  [7,  8]  and  will  only  be  reviewed  briefly  here. 

The  time  series  shown  in  Figures  3 and  4 are  analysed  on  a subinterval 
and  a cumulative  basis,  respectively. 

Subinterval  Statistics 

a)  Mean  residuals  between  the  real  and  simulated  data  are 


defined  as: 


2.5  5.0  7.S  10.0  12.6  16.0  17.6 

Use) 


Figure  3.  Subinterval  TIC 


0.0  2.6  6.0  7.6  lao  12.5  16.0  17.6 

Use) 

Figure  4.  Cumulative  mean  residual  and  TIC 
(real  versus  hardware  model). 


where  la  the  ith  sample  from  the  real  data,  SA  is  the  ith  sample  from 

the  simulated  data,  j is  the  subinterval  counter,  and  n'  is  the  number  of 
points  on  the  subinterval. 
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b)  Confidence  bounds  on  6j  are  given  by: 


LB  « e - 6 J a-  /n' 

2 J 


• *j + 6 


where  is  the  variance  of  on  the  jth  interval  corrected  for  corre- 
lation effects  end  100  ° (1  - 1/0^)  is  the  percent  confidence  desired, 

c)  Thall's  Inequality  Coefficients  (TIC) 


f «'  i V ' 

n7  Z (An'<J-l)-Hc  " Sn'(J-l)+k] 


/ n / n’ 

4 y a2  + /l  y P2 

n'  Z V(j-l)-Ht*V  n'  Z n'(j-1 


d)  Thells  Coefficient  of  Unequal  Central  Tendency 


uH.MI2 

U L NUM  J 


where  S is  Che  mean  of  the  on  interval  j,  A is  the  mean  of  the  A^ 
on  interval  J,  and  NUM  is  the  numerator  given  in  Equation  (4). 

e)  Theil's  Coefficient  of  Unequal  Variation 


°a  : gs 


where  is  the  sample  standard  deviation  of  the  A^,  on  interval  J and 
Og  is  the  sample  standard  deviation  of  the  on  interval  J. 

f)  Theil's  Coefficient  of  Imperfect  Covariation 


UC-  ^ 


2(1  - r)  cscA 


r 


where  r is  the  correlation  coefficient  between  the  A ^ and  on  interval  j. 

In  addition  to  the  subinterval  statistics,  two  cumulative  statistics 
are  also  computed.  These  are  cumulative  mean  residual  and  cumulative  TIC. 

III.  DESCRIPTION  OF  EXPERIMENT.  The  statistics  package  was  run  on 
three  sets  of  data: 

a)  Rial  versus  hardware  simulation. 

b)  Real  verru::  LDWSS  . 

c)  Hardware  simulation  versus  LDWSS. 

Each  time  series  consisted  of  2020  data  points  with  a delta  time  of 
0.0078125  sec,  The  series  was  divided  into  20  intervals  each  containing 
100  data  points.  The  percent  confidence  requested  for  the  mean  residual 
was  957..  Each  run  produced  a tabular  output  of  the  statistics  as  well 
as  several  plots, 

The  hardware  chosen  for  the  experiment  is  the  actuator  which  is 
shown  in  its  rao^t  complete  form.  That  is,  the  model  in  Figure  5 repre- 
sents the  best  information  available  for  the  actuator.  It  was  subse- 
quently reduced  to  the  model  shown  in  Figure  6 for  use  in  the  LDWSS  simu- 
lation program.  The  objectives  were  to  determine  whether  the  complete 
model,  referred  to  as  the  "hardware  simulation"  was  well  represented 
by  the  reduced  model,  referred  to  us  the  "LDWSS  model"  and  whether  either 
or  both  were  high  fidelity  moR  ils  of  the  hardware  test  data,  referred  to 
as  "real  data."  The  real  data  were  obtained  from  flight  recordings  of 
the  output  of  the  actuator  as  a response  to  Input  commands.  Consequently, 
the  input-output  command  and  response  time  series  history  is  an  accurate 
portrayal  of  the  transfer  function  characteristics  of  the  actuator  in 
Figure  7. 

An  example  of  real  data  compared  to  simulated  hardware  data  is 
shown  in  Figure  8.  It  is  a plot  of  the  mean  residuals  and  confidence 
bounds  (shown  as  vertical  lines) . From  this  plot,  it  can  be  seen  that 
the  means  of  the  real  and  simulated  data  agree  rather  well  with  small 
mismatches  on  Intervals  8 and  9,  where  the  mean  residuals  are  -0.23  and 
-0,32,  respectively.  Considering  the  range  of  values  for  the  original 
data,  these  residuals  are  quite  small.  Remember  the  ideal  case  is  zero 
mean  residual  with  a small  confidence  bound.  Subintervals  8 and  9 corre- 
spond to  the  time  period  immediately  following  guidance  initiate.  This 
is  the  point  in  the  flight  of  the  missile  where  the  reflected  laser  energy 
starts  to  contribute  to  the  guidance  loop.  The  small  degradation  in  the 
last  couple  of  subintervals  *a  due  to  the  fact  that  the  missile  is  In 
terminal  guidance  where  an  acceleration  in  rate  changes  is  cottmon. 

Figure  3 is  a graph  of  the  subinterval,  the  TIC  for  the  same  two 
sets  of  data,  i.e.,  real  and  hardware  model  time  series.  A TIC  of  zero 
indicates  equality  between  the  two  series , which  in  turn  indicates  that 


Figure  7.  Real  flight  date  digitized.  1 >1 

a perfect  model  had  been  hypothesized  for  the  actuator . By  the  eighth  9 
subinterval,  the  value  is  reduced  to  approximately  0.05  and  it  remains  1 
small  thereafter.  The  data  in  Figure  4 ara  a more  detailed  view  of  the  .1 
same  data  and  include  the  cumulative  mean  residual  for  comparison  with  | 
the  cumulative  TIC.  i fl 


IV.  CONCLUSIONS.  The  preceding  statistics  represent  a small  sub- 
set of  those  available  to  the  analyst  for  validating  dynamic  systems.  Many 
(Bibliography)  agree  that  these  can  supply  useful  and  meaningful  infor- 
mation for  validation  purposes.  However,  there  are  some  who  feel  that 


I I 


MEAN  RESIDUA3 4.  ANO  CONFIDENCE  BOUNDS 

Figures  8.  Real  versus  hardware  simulation  residuals 
and  confidence  bounds. 

special  techniques  must  be  used  to  analyse  nonstationary  systems  and  the 
straightforward  statistical  quantities  (as  those  discussed  in  this  sum- 
mary) are  questionable  in  the  cases  where  the  models  being  analyzed  pro- 
duce highly  nonstationary  data.  Work  is  underway  using  variations  of 
these  techniques  . as  well  as  spectral  techniques  to  circumvent  the  prob- 
lem. Early  results  appear  promising. 
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ANALYSIS  OF  VARIANCE  OF  MULTIVARIABLE 
FLIGHT  TEST  DATA 

A CALL  FOR  ASSISTANCE 

James  S,  Hayden 

US  Army  Aviation  Engineering  Flight  Activity 
Edwards  Air  Force  Base,  California  93523 


INTRODUCTION;  The  flight  teat  community  Is  frequently  called  upon 
to  define  changes  In  performance  resulting  from  a change  In  configura- 
tion of  an  aircraft.  Even  with  extreme  attention  to  control  of  test 
condition  state  variables,  the  problem  of  duplication  of  conditions  Is 
an  order  of  magnitude  more  difficult  than  In  a laboratory  environment. 
Further  complicating  the  problem  Is  the  fact  that  depending  on  the 
flight  regime,  up  to  three  non-1 Inear  or  eleven  linearized  Independent 
variables  are  Involved.  Measurement  errors  may  be  present  In  each  of 
the  independent  variables.  Published  methods  for  analysis  of  variance 
are  Inadequate  to  treat  this  problem.  A brief  case  history  of  deter- 
mination of  the  change  In  hovering  performance  due  to  a rotor  system 
change  Is  presented  to  Illustrate  the  problem.  Measurement  accuracies, 
test  techniques  and  analysis  methods  are  discussed  to  highlight  the 
problem  and  suggest  areas  where  discussion  of  statistical  analysis 
techniques  would  be  most  useful. 

HELICOPTER  PERFORMANCE  TEST  TECHNIQUES;  Pre-test  preparation 
Includes  Calibration  of  most  performance  Instrumentation  data  sensors 
and  Indicators  to  N.B.S.  secondary  reference  standards.  Wherever 
possible  "end  to  end"  calibrations  are  performed  on  complete  measurement 
subsystems  after  Installation  in  the  test  vehicle.  Certain  systems  such 
as  engine  torquemeters  and  Instrumented  rotor  shafts  are  of  necessity 
calibrated  by  contractors.  Prior  to  testing,  the  fully  Instrumented 
helicopter  Is  subjected  to  multiple  precision  weighings  to  accurately 
determine  weight,  center  of  gravity  location,  and  to  calibrate  fuel 
I cells.  Strict  Inventory  control  of  useful  load  Items  such  as  ballast, 

armament  load,  parachutes,  oxygen  equipment,  individual  crew  composition 
and  pre/post  flight  fuel  mass  are  kept  on  a flight  by  flight  basis.  Re- 
callbratlons  and  re-welghlngs  are  performed  periodically  during  the  test 
[ program. 

| The  vast  majority  of  precision  performance  data  Is  gathered  under 

, stabilized  conditions.  Using  the  great  outdoors  as  your  laboratory  has 

• esthetic  advantages  but  your  ability  to  carefully  control  the  environ- 

ment Is  quite  limited.  Smooth  air  Is  essential  for  all  tests  and  steady 
winds  not  exceeding  three  knots  are  required  for  hover  performance 
i tests.  Wind  Is  not  as  critical  for  tests  performed  at  altitude  but 

; caution  must  be  exerclsad  to  avoid  mountain  waves  which  may  seem  smooth 

as  glass  while  the  air  mass  Is  rising  and  falling  sinusoidally  In  a 
’ pattern  relatively  stationary  with  respect  to  the  ground.  Errors 

> equivalent  to  rates  of  climb  of  + 500  ft/mln  are  not  uncommon  In  these 

atmospheric  formations. 


i 

i 
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It  should  be  clear  that  the  ability  of  the  test  pilot  to  stabilize 
the  aircraft  with  a minimum  of  control  motions  and  to  hold  this  condi- 
tion for  the  required  data  recording  time  period  Is  of  primary  Import- 
ance. On  many  tests  the  state  variables  are  also  controlled  to  hold 
certain  non-dimensional  variables  (to  be  discussed)  constant  for  a 
series  of  data  points.  This  process  Is  Itself  quite  Involved  and 
requires  the  flight  test  engineer  to  calculate  a target  altitude  and 
rotor  RPM  for  the  next  data  point  based  on  cockpit  observed  values  of 
airspeed,  altitude,  air  temperature  and  fuel  used.  The  calculations  are 
quite  Involved  and  require  use  of  charts,  a programmable  calculator  or  a 
telemetry  down  link  with  voice  up  link.  Errors  which  may  accumulate  In 
the  various  steps  (engineer  reading  of  cockpit  indicators,  calculations, 
and  pilot  setting  of  conditions  using  cockpit  Indicators)  are  reduced 
significantly  by  the  use  of  telemetry.  The  key  point  Is  that  the 
accuracy  of  establishing  desired  flight  conditions  Is  limited. 

PERFORMANCE  DATA  PARAMETER  MEASUREMENT  ACCURACY : As  has  been 
pointed  out,  flight  testing  Is  not  conducted  In  a laboratory  environ- 
ment. Test  Instrumentation  Is  exposed  to  a host  of  alien  environmental 
factors;  vibration,  temperature  extremes,  shock,  dirt,  etc.  The  flight 
test  engineer  quickly  recognizes  that  brochure  accuracies  are  unreal- 
istic In  practical  application.  Experience  has  shown  that  the  following 
accuracies  can  ba  achieved  with  reasonable  attention  to  detail. 

FLIGHT  TEST  PERFORMANCE  OATA  MEASUREMENT  ACCURACY 


PARAMETER 


SYSTEMATIC/ERROR 


POINT  ERROR 


Gross  Weight 
Engine  Torque 
Calibrated  Airspeed 
Rotor  Speed 
Air  Temperature 
Pressure  Altitude 


30  Lb 
1.5* 
0.5  KT 


15  Lb 
1% 

0.3  KT  (>100KCAS) 
0.1* 

0.5°C 
20  Ft 


The  Impact  of  these  uncertainties  on  the  analysis  of  hovering  data 
will  be  discussed  In  more  detail  later. 

HELICOPTER  PERFORMANCE  MODELS;  The  versatility  of  the  helicopter 
expressed  If n Its  ability  to' fly  literally  In  any  direction  presents  an 
extremely  complex  performance  analysis  statement.  For  the  purpose  of 
describing  the  subject  statistical  analysis  challenge,  we  will  restrict 
the  discussion  to  two  Important  flight  regimes;  hover  add  cruise. 

Non  dimensional  methods  are  commonly  used  in  helicopter  performance 
analysis.  The  parameters  of  interest,  for  our  restricted  discussion, 
are  power  coefficient  (CP),  thrust  (or  weight),  coefficient  (CT) , 
advance  ratio  (u) , and  advancing  tip  mach  number  (M). 
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NON  DIMENSIONAL  PERFORMANCE  PARAMETERS 


CP  - 

( . 5 2 

po  (6/6)  itR  n 

CT  - _W 

* 4 2 

po  (6/0)  irR  0 

y - KTAS  X 1,68781 
OR 

M • (1  + y)  OR 

1 116.45  /T“ 


WHERE: 


CONSTANTS 

3 

po  ■ S.L.  Std  Atmospheric  density,  slug/ft  . 

R * Rotor  radius  , ft. 

ir,  1.68781,  1116.45  - CONSTANTS 

MEASURED  PARAMETERS. 

Q ■ Total  delivered  torque  at  rotor  speed,  Lb-ft. 

0 - Rotor  rotational  speed,  rad/sec. 

6 - Ambient  atmospheric  pressure/S. L.  std  ambient  pressure,  Dim. 

6 ■ Ambient  atmoshperlc  absolute  tamperature/S.L.  std.  Ambient 

absolute  temperature,  Dim. 

W ■ Aircraft  gross  weight  , lb. 

KTAS  ■ True  airspeed,  Kt. 

Hover  power  required,  In  simple  terms,  may  be  considered  to  be 
composed  of  Induced  power  (energy  required  to  produce  lift)  and  profile 
power  (energy  required  to  overcome  rotational  drag  of  the  blades). 

Rotational  drag  Is  composed  of  a base  drag,  a component  of  drag  due  to 
lift,  an  additional  drag  due  to  compressibility  and  In  some  cases  an  addl- 
tlonal  drag  due  to  stall.  In  coefficient  form,  a model  which  has  proven 
effective  Is: 

HOVER  POWER  REQUIRED  MODEL 
CP  - A + B CT3/2  + c CT3  + f (CT,  M) 

This  Is  the  equation  form  which  will  be  used  with  the  specific  example 
to  be  discussed. 


! 

I 


j 

ij 


Forward  flight  power  required  includes  additional  components; 
parasite  power  (energy  required  to  overcome  airframe  drag),  additional 
profile  power  due  to  forward  speed  (u),  and  stall  and  compressibility 
power  which  is  a function  of  p,  CT,  and  M.  A typical  forward  flight 
power  required  model  Is: 

FORWARD  FLIGHT  POWER  REQUIRED  MODEL 

CP  - A ( 1 + 3 /)  + D fir  ♦ E w + Fp3  + f (CT,u,M) 

l* 

The  functional  relationship  Indicated  for  stall  and  compressibility 
power  understates  the  complexity  of  the  phenomena.  The  onset  of  stall 
Is  usually  defined  for  a specific  aircraft  as  a unique  relationship 
between  CT  and  p.  This  unique  relationship,  or  boundary,  Is  however  a 
function  of  both  the  drag  configuration  (l.e.,  rocket  pods,  doors  open, 
etc.)  and  the  rotor  tip  mach  number.  The  onset  of  compressibility 
effects  Is  usually  defined  as  unique  relationship  between  CT  and  M, 
however,  this  boundry  is  also  a function  of  p. 

The  gross  trends  of  these  additional  power  components  are  Illus- 
trated In  Figures  I and  2. 

Now  that  you  have  been  Introduced  to  the  complexity  of  our  forward 
flight  problem,  lets  turn  our  attention  to  the  simple  example  problem  to 
be  used  to  Illustrate  our  challenge  - determination  of  the  change  In 
hovering  performance  due  to  a rotor  system  change. 

COMPARATIVE  HOVERING  PERFORMANCE  TESTS:  The  United  States  Army 
Engineering  Flight  Act  I vl ty  conducted  comparative  tests  of  two  types  of 
rotor  blade  Installed  on  an  AH-IR  helicopter.  The  tests  were  conducted 
at  field  elevations  from  approximately  2,300  ft  to  10,000  ft  over  a span 
of  approximately  three  months.  The  comparison  of  out  of  ground  effect 
hovering  performance  was  only  one  of  the  many  objectives  of  the  test  and 
Is  the  only  subject  which  will  be  discussed  here. 

All  test3  were  flown  on  the  same  aircraft  with  the  same  engine  and 
with  the  same  basic  instrumentation.  Data  were  obtained  with  each  blade 
type,  "Back  to  Back",  at  each  of  the  three  test  sites, 

Data  were  obtained  by  stabilizing  the  helicopter  In  hover  at  a skid 
height  of  100  + 2 ft  for  a period  of  not  less  than  20  sec.  Data  were 
recorded  continuously  for  a period  of  approximately  10  sec  at  a sample 
rate  of  100  samples/sec.  The  data  records  were  then  edited  from  time 
history  strip  charts.  Acceptable  data  points  were  then  edited  to  the 
most  stabilised  6 sec  of  record.  The  edited  record  was  then  used  to 
calculate  the  non-dimensional  parameters  based  on  actual  data  every 
tenth  of  a second.  Tho  calculated  non-dlmenslona!  parameters  were  then 
averaged  over  the  period.  This  leads  to  the  first  question  to  be  posed 
In  this  clinical  discussion: 


F 
f i 


ii 

Ii 

■I 
; ! 


I 


i 

i 


>•  i 


f 


t 

i 

! 


! 

I 


i' 

t 

V 


I 


i 

! 

I 

; 


I . "Should  data  ha  averaged  as  measure  or  after  calculation?11 

The  data  gathered  during  these  tests  is  presented  graphically  In 
Figures  3 and  4. 

DATA  ANALYSIS;  The  edited  averaged  data  points  were  analyzed  by 
performing  a multiple  linear  regression  of  the  hover  power  required  In 
the  form: 


3 3/2 

CP  - A + 8 CT  + C CT  ' + DM 

Results  of  the  regressions  are  summarized  as  follows: 


MULTIPLE  LINEAR  REGRESSION  DATA 


3 

CP  + A + B CT  + 

3/2 

C CT  ' + DM 

WITH  MACH 

NUMBER 

NO  MACH  i 

DUMBER 

BLADE 

A 

B 

A 

B 

n 

82 

58 

82 

58 

A 

-3.755-8 

-3.189-8 

-1.380-7 

-7.457-8 

B 

3.673+2 

2.430+1 

-1.879+2 

-2.040+2 

C 

9.458-1 

1.120+0 

1 . 376+0 

1.319+0 

D 

2 

1.250-4 

6.431-5 

0 

0 

R 

9.859-1 

9.843-1 

9.832-1 

9.839-1 

s 

8.965-6 

1.085-5 

9.760-6 

1.099-5 

The  nominal  performance  design  point  for  the  AH-1R  Is  9,000  lb 
grojjs  weight  at  4,000  ft,  35°C  or  a thrust  coefficient  (CT)  of  55.34  X 
10  and  a tip  mach  number  (M)  of  0.6465.  Evaluation  of  the  polynomials 
yields  the  following  power  coefficients  for  the  two  blade  sets. 

CP(A)  - 53.24  X lo"5 

CP(B)  - 50.67  X 10'5 

If  the  problem  being  addressed  was  linear  with  a single  independent 
or  even  multivariate,  the  analysis  of  variance  would  be  straight  for- 
Recall  the  description  of  the  functional  relationships  given  In 
AMCP  706-110  where  X values  can  be  measured  exactly  (FI)  and  where 
errors  may  be  present  In  the  X measurwnent  (Fll). 

Now  recall  the  possible  point  errors  of  the  present  example,  as 
mp  led  by  Instrumentation  accuracies.  The  vectors  representing  the 
Indly  dual  effect  of  lo  data  errors  on  the  non-dlmens Iona  1 coefficients 
are  Illustrated  In  Figure  5 as  are  the  maximum  possible  lc  measurement 
errors  and  clearly  Illustrate  that  wo  are  confronted  with  an  Fll  situation. 
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This  leads  to  the  concluding  question  of  this  clinical  presents- 
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2.  "What  procedures  are  recommended  for  calculating  a speci- 
fied difference  In  average  performance,  with  a chosen 
degree  of  confidence  with  a multivariable  Fit  relation- 
ship?11 

SUMMARY  OF  QUESTIONS; 

1 . "Should  data  be  averaged  as  measured,  or  after  calculation?" 

2.  "What  procedures  are  recommended  for  calculating  a specified 
difference  In  average  performance,  with  a chosen  degree  of 
confidence  with  a multivariable  FI  I relatlonshl p7" 


REFERENCES: 

1.  "Airworthiness  and  Flight  Characteristics,  Improved  Main  Rotor 

Blade  on  the  YAH-lR",  Yamakawa  Et,  A1 , USAAEFA  Project  No  76-08. 

2.  AMCP  706-110,  DEC  1369. 
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FIGURE  1..  COMPRESSIBILITY  EFFECTS  ON  GENERALIZED 
LEVEL  FLIGHT  PERFORMANCE 


FIGURE  Z.  RETREATING  BLADE  STALL  EFFECTS  ON 

GENERALIZED  LEVEL  FLIGHT  PERFORMANCE 


H//J  * CONSTANT 
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ENGINE  POWER  COEFFICIENT,  CP  X 10s  * * ™S 
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FIGURE  a 

OUT-OF-GROUND  EFFECT  NONDIMENSIONAL  HOVERING  PERFORMANCE 
YAH-1R  USA  S/N  70-15936 
ENGINE  T53-L-703  S/N  LE15124Z 
SKID  HEIGHT  • 100  FEET 


SYM  REFERRED  ROTOR  DENSITY  OAT 

SPEED  RANGE  ALTITUDE 
(RPN)  (FEET)  (°C) 
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ENGINE  POWER  COEFFICIENT 


FIGURE  4 

OUT-OF-GROUND  EFFECT  NONOIMENSIONAL  HOVERING  PERFORMANCE 
YAH-1R  USA  S/N  70-15936 
ENGINE  T53-L-703  S/N  LE15124Z 
SKID  HEIGHT  - 100  FEET 


NOTES: 


1.  Aft  BLADES  S/N  1005  A 1009. 

2.  VERTICAL  HEIGHT  FROM  BOTTOM  OF  SKID  TO 
CENTER  OF  ROTOR  HUB  > 11.9  FEET. 

3.  WINDS  LESS  THAN  3 KNOTS. 

4.  FREE  FLIGHT  HOVER  TECHNIQUE.  0 

5.  AVERAGE  LONGITUDINAL  C.G.  - jfl  > 

(F$n«.1  (MID).  * A 

6.  AVERAGE  LATERAL  C.G.  * a Am 


MAIN  ROTOR  THRUST  COEFFICIENT,  CT  X 101*  - X ]0" 
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ANALYSIS  OF  VARIANCE:  SELECTION  OF' A MODEL  AND  SUMMARY  STATISTICS 


Frederick  Steinheiser,  Jr.  & Kenneth  I.  Epstein 
Army  Research  Institute  for  the  Behavioral  and  Social  Sciences 

Alexandria,  VA  22333 

SUMMARY 

Three  models  can  be  used  to  perform  ANOVA:  fixed,  random,  or  mixed. 
The  choice  of  a model  is  determined  by  the  sampling  plan  of  the  treatments 
e.g.,  if  sampling  was  exhaustive,  then  no  generalization  beyond  such 
sampled  levels  is  allowable.  Two  summary  statistics  may  also  be  computed: 
the  F-ratio  (to  test  the  hypothesis  of  an  effect  due  to  a given  treatment) 
and  an  index  of  the  magnitude  of  experimental  effect  (also  called  the 
proportion  of  variance  accounted  for  by  a given  treatment  effect',.  This 
paper  examines  the  relationship  between  ANOVA  models,  summary  statistics, 
and  the  inferences  that  can  be  drawn  from  them.  Data  from  a completely 
crossed  repeated  measures  experiment  are  presented,  to  show  how  some 
inferences  about  effects  can  change  as  a function  of  the  model  selected 
and  the  summary  statistics  which  are  then  computed. 
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Introduction 


The  topics  of  this  paper  are  models  for  the  analysis  of  variance 
(fixed,  random,  or  mixed  ANOVA  models),  and  the  subsequent  summary 

I 

statistics  (F-ratio,  quasi-F -ratio , and  magnitude  of  treatment  effect) 
which  may  be  computed  following  the  ANOVA.  ANOVA  is  a useful  method 

l 

for  assessing  the  statistical  significance  of  treatment  effects.  But  the 
significance  of  an  effect  is  a function  of  two  decisions.  First  is  the 
■selection  of  a model  and  an  appropriate  sampling  plan  for  elements  within 
each  of  the  treatment  factors.  Second  is  the  choice  of  summary  statistics 
which  indicate  the  extent  of  significance  achieved.  In  this  paper, 
comparisons  will  be  made  between  models,  and  between  summary  statistics. 

Specific  issues  will  be  clarified  concerning  the  interpretation  of  results 
when  various  models  and  summary  statistics  are  used  on  the  same  set  of  data. 
Selection  of  an  ANOVA  Model 

In  the  fixed -effects  model,  the  levels  of  the  independent  variables 

t 

are  assumed  to  have  been  exhaustively  sampled.  No  generalization  beyond  j 

those  levels  sampled  is  intended,  or  theoretically  permissible.  The 

random  effects  model  assumes  that  the  selected  treatment  variables  have  • 

I 

been  randomly  selected  from  a very  large  population  of  such  variables. 

Generalization  of  results  from  the  random  sample  to  the  population  is 
allowed.  The  mixed  model  allows  both  fixed  and  random  factors  to  be 
; studied  in  the  same  experiment,  with  the  results  for  each  factor  to  be 

i interpreted  according  to  that  factor's  sampling  plan. 

f i 

j The  choice  of  a model  has  an  impact  upon  the  probability  of  1 

obtaining  the  observations  under  the  null  hypothesis  for  each  treatment 

I 


(factor).  Behavioral  research  is  particularly  vulnerable  to  the  choice 
of  a model,  because  often  the  investigator  can  use  only  a limited  sample 
of  the  possible  number  of  stimuli  (items,  drug  doses,  etc.).  Furthermore, 
because  of  the  difficulty  in  creating  comparable  sets  of  stimuli,  the 
same  stimulus  set  may,  by  necessity,  be  given  to  aii  subjects. 

As  a simple  hypothetical  experiment  (adapted  from  Clark,  1973), 
suppose  that  two  classes  of  stimuli,  nouns  and  verbs,  are  individually 
shown  to  subjects.  We  want  to  see  if  it  takes  the  same  time  to  Identify 
each  word  as  a member  of  the  correct  part-of-speech  class.  This  simple 
hypothesis  will  be  shown  to  have  interesting  implications  for  both 
experimental  design  and  statistical  analysis. 

First  of  all,  fixed  sets  of  nouns  and  verbs  which  are  matched  on 
relevant  parameters,  such  as  number  of  letters  and  frequency  of  occurrence 
should  be  prepared.  If  we  want  to  be  able  to  generalize  to  the  full 
domain  of  nouns  and  verbs,  each  subject  should  receive  a different  random 
sample  of  words  from  the  two  lists.  However,  it  is  impossible  to  match 
the  words  on  all  relevant  variables.  It  is  also  practically  impossible 
to  use  a different  random  sample  of  words  for  each  subject. 

Consider,  then,  the  following  experimental  design,  in  which  "s" 
subjects  are  each  presented  "w"  different  nouns  and  verbs: 

TABLE  1.  Assignment  of  Subjects  and  Parts  of  Speech. 

Part  of  Speech 

Subject:  P1  (nouns)  Pp  (verbs) 

S1  wi* ’ ' *ww/2  ww/2+l*  * ’ ,Ww 

• • 

■ • 

Ss 

In  order  to  compare  the  adequacy  of  the  several  possible  F ratios  for 
testing  the  difference  in  response  time  to  the  two  ''treatment"  (part  of 
speech)  conditions,  the  following  tables  of  expected  mean  squares  will 
bo  helpful: 


TABLE  2.  EMS  Assuming  Parts  of  Speech  is  a Fixed  Factor,  and  Subjects 


and  Words  are  Random. 


Source 


P (Part  of  speech)  <r\  + awe?  + ao£(p)  + 

W(P'  (Words  within  part  of  speech)  + sc^pj  + °s3cw(p) 
S (Subjects)  0^  + pwqj  + 

p x s °e  + wapxs  + ^xw(p) 

S * W<P>  + ^sxw(p) 


TABLE  3.  EMS  Assuming  Parts  of  Speech  and  WordB  are  Fixed,  and 


Subjects  are  Random. 


Source 


+ swo*2  + 

e P pxs 

°2  * b“5(P)  + 4».(p) 

c£  + pwoi 


P x S 
S x W(P) 


c?  + w?2 
e wpxs 

^e  + ^sxw(p) 


If  we  choose  to  test  the  significance  of  the  Parts  of  Speech  treatment, 
the  appropriate  F-ratdo  for  the  model  illustrated  in  Table  2 is: 

**1  = MSp/MSpxs.  The  only  terra  in  the  numerator  that  is  not  in  the 
denominator  is  swo^.  However,  if  this  same  F-ratio  is  used  with  the  model 
in  Table  3 (applicable  when  generalization  is  desired  to  all  nouns  and 
verbs),  then  this  F-ratio  will  contain  two  terras  that  are  not  in  the 
denominator:  s^(p)  and  swc*.  And,  u8ing  alternative  error  terras  in  the 
parts-of  speech  fixed,  words  random  model  (Table  2)  also  leads  to  the 
same  problem.  For  example,  if  we  test  the  parts  of  speech  effect  against 
the  words  within  parts  of  speech  effect,  we  obtain  F^  - MSp/MSw(p^.  In 
this  case,  EMS  exceeds  EMS  / > by  the  amount  of  wj^  a + wss£,  Therefore, 


this  F2  ratio  would  also  be  significant  when  the  true  contribution  of 
due  to  parts  of  speech  (treatments)  is  really  zero.  In  summary,  both 
and  Fj . could  be  significant  when  = 0,  provided  that  and  Q~xs 
exceed  zero. 

A possible  solution  to  this  dilemma  is  to  take  the  "quasi-F"  ratio, 
or  F’ , which  equals  (MSp  + MSaM(pp/(MSpX8  + MSw(p)).  Now  the  only 
term  in  the  numerator  which  is  not  in  the  denominator  is  Op.  However, 

F'  is  only  approximately  distributed  as  F,  although  the  error  involved  is 
not  large,  provided  that  adjustments  are  made  to  the  degrees  of  freedom. 

Another,  more  conservative  solution  is  minimum  F* , which  assumes  that 
MSsxw(p)  is  zero.  A more  detailed  discussion  of  this  problem  may  be  found 
in  Clark  (1973). 

A series  of  Monte  Carlo  computer  simulations  (Forster  & Dickinson, 
1976)  explored  the  relationship  between  all  of  the  above  F-ratios  and 
and  the  resulting  type  I error  rates.  Generally,  F^  and  F?  eiLone 
produced  unacceptably  high  error  rates,  whereas  F'  and  min  F'  were  more 
conservative,  as  can  be  seen  in  Table  3. 

TABLE  3.  Type  I Error  Rates  as  a Function  of  Variation  in  MS8Xp  and 

MSw(p).  (500  observations  per  situation,  alpha  <=  .06,  p = 2, 
q = 5,  r = 9) 


Source  of  Variance 
Manipulated 

s.d^ 

9 .d.g 

F1 

* 2 

min  F* 

F' 

Neither 

0 

0 

.044 

,046 

.010 

.026 

MSw(p) 

5 

0 

.228 

.052 

.038 

.044 

10 

0 

.484 

.070 

.060 

.060 

15 

0 

.586 

.056 

.048 

.052 

20 

0 

.724 

.050 

.048 

.048 

MS 

0 

5 

.042 

.146 

.024 

.036 

sxp 

0 

10 

.064 

.388 

.048 

.042 

0 

15 

.036 

.520 

.032 

.034 

0 

20 

.042 

.588 

.038 

.042 

j Both 

5 

5 

.124 

.096 

.034 

.042 

!l 

10 

10 

.190 

.090 

.040 

.040 

] 

15 

15 

\W 

.138 

.056 

.064 

• i 

i 

20 

20 

.118 

.048 

.048 

As  can  be  seen  in  Table  4 , increasing  the  number  of  items  and  subjects 
tends  to  decrease  type  I error  for  the  fixed  effects  model,  where  only 
subjects  are  random.  Min  F'  and  F'  continue  to  have  lower  error  rates. 
TABLE  4.  Type  1 Error  Rates  as  a Function  of  the  Numbers  of  Subjects  and 
Items.  ( 300  observations  per  situation,  s.d.^  = s.d.g  = 20,  and 
alpha  = .05.) 


Number  of  Subjects 

Number  of  Items 

F1 

F2 

rainF' 

F* 

10 

5 

.240 

.070 

.040 

.040 

10 

20 

.090 

.290 

.053 

.053 

20 

5 

.307 

.077 

.067 

.067 

20 

20 

.193 

.217 

.060 

.060 

The  "Magnitude  of  Effect"  as  a Summary  Statistic 
The  F ratio  indicates  the  level  of  statistical  significance  that  can 
be  attributed  to  a particular  treatment.  The  degree  of  statistical 
significance  is  a joint  function  of  the  "true"  strength  of  that  factor, 
the  error  variability  (which  reflects  the  degree  of  experimental  control), 


i 


and  the  sample  size  (i.e.,  number  of  subjects  tested).  As  sample  size 
increases,  there  is  increasing  power  to  reject  a false  null  hypothesis. 
Thus,  in  conducting  large  scale  experiments  with  hundreds  of  subjects, 
the  large  "n"  may  be  necessary  in  order  to  detect  a weak  "signal"  burled 
in  a background  of  "noisy"  data.  But  the  large  n may  also  lead  to 
spuriously  significant  F-ratios  which  are  actually  statistical  artifacts. 

One  index  for  assessing  the  significance  of  effects  is  the  "magnitude 
of  effect,"  also  sometimes  referred  to  as  the  "proportion  of  variance 
accounted  for,"  It  is  Interesting  to  note  that  relatively  few  research 
papers  have  included  this  index,  compared  to  the  ubiquitous  F-ratio. 
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Basically,  the  magnitude  of  effect  (m.e.)  measures  the  degree  of 
association  between  the  independent  variable (s)  and  the  dependent 
variable(s).  In  the  simplest  case  for  ANOVA  having  fixed  factors,  none 
of  which  are  repeated,  the  m.e.  formula  is: 

magnitude  of  effect  = (SSeffeet  - 0feffect  x MSerror)  )/'(SStotal  + ^error** 
Rules  for  deriving  m.e,  indices  are  provided  by  Dodd  & Schultz  (1973), 
along  with  tables  for  representiave  ANOVA  designs. 

The  concern  of  the  present  paper  is  with  the  Interpretation  of  these 
summary  statistics,  since  both  F and  m.e.  can  be  computed  from  the  same 
set  of  data.  It  is  clear  that  as  the  statistical  significance  for  a given 
effect  increases— i.e, , tho  p( observation/ null)  decreases — the  magnitude 
for  that  effect  also  increases.  But  it  is  also  possible  that  an  F-ratio 
may  be  highly  statistically  significant,  yet  the  m.e.  for  that  effect 
could  account  for  only  some  very  small  proportion  of  the  overall  variance. 
The  results  from  an  experiment  summarized  in  the  following  section 
show  that  when  statistical  significance  (£<.001)  was  achieved  by  several 
treatments,  the  m.e.  for  these  treatments  ranged  from  1%  to  23%. 

A Study  of  Marksmanship 

Consider  the  following  experiment  which  was  conducted  for  the 
(J. S.  Army  Military  Police  School  at  Fort  McClellan,  Alabama.  Each  of 
237  students  shot  a total  of  240  handgun  rounds  from  eight  different 
position-distance  combinations.  There  were  three  repetitions  of  BO  shots 
each,  at  stationary  silhouette  targets.  Within  each  repetition,  five 
shots  were  taken,  the  weapon  was  reloaded,  and  five  more  shots  were 
fired  in  the  adjacent  test  lane.  (Each  subject  had  previously  passed  a 
training  course  with  a score  of  at  least  35  hits  out  of  50  shots.)  In 
the  test,  160  trials  (2  repetitions)  were  taken  on  Thursdays,  the  third 


was  taken  on  Fridays.  The  completely  crossed  design  was  therefore: 

A x B x C xD,  or  237  x S x 8 x 3,  or  subjects  x lanes  x tables  x 
repetitions. 

Table  5 highlights  the  results  of  the  ANOVA  from  this  experiment. 

The  first  column  of  F-ratios  <iae.ur:ea  a mixed  model,  with  B,C,D  as  fixed 

> 

factors.  The  second  columvi  of  F-ratios  assumes  that  only  Tables  was 
a fixed  factor.  The  third  F-ratio  column  assumes  that  all  four  factors 
were  randomly  sampled  from  their  respective  populations.  The  point  is 
rather  obvious:  different  ANOVA  models  produce  different  F-ratios  for 
null  hypothesis  rejection,  given 'the  same  set  of  data. 


TABLE  S.  Changes  in  F-Ratios  as  a Function  of  ANOVA  Model 


Source 

d.f.1 

MuL. 

E2 3 4  ES 

4 1 

L j 

A (Subjects) 

236 

12.80 

3.93**** 

2.54**** 

B (Lanes) 

1 

7.70 

7.33****  5.96** 

2.26 

j 

C (Tables) 

7 

732.71 

385.64****  79.11**** 

79.11*** 

D (Repetitions) 

2 

34.75 

14.18****  12.55**** 

4.71** 

****:£<  .001 

< 

.01  ** 

:j>  <.025  *:p  <.05 

1 

1.  d.f,  for  F-ratios  were  obtained  using  the  Satterthwaite  approximation. 

2.  A random;  B,  C,  D fixed  effects. 

3.  A,  B,  D random,  C fixed. 

4.  A,  B,  C,  D all  random  effects. 

The  problem  of  interpreting  the  F-ratios  now  needs  to  be  addressed. 

Ifl.  there,  for  example,  a significant  effect  due  to  lanes  or  to 
repetitions?  If  these  effects  are  assumed  to  be  fixed,  the  answer  is  yee; 
if  they  are  assumed  to  be  random,  the  answer  for  lanes  is  no,  and  for 
repetitions  the  level  of  statistical  significance  has  greatly  decreased. 

i 

i 


I 


We  offer  the  suggestion  that  the  choice  of  the  ANOVA  model  (and 
ultimately  the  level  of  significance  reached)  lies  in  the  eye  of  the 
beholder — the  scientist  himself.  From  a sponsor’s  peraepctive , it  may 
well  be  that  only  those  conditions  which  are  studied  in  the  experiment 
are  of  interest.  If  many  lanes,  repetitions,  or  even  tables  are  never  to 
be  studied  or  added  to  his  testing  program,  then  those  factors  would  never 
be  sampled  from  a larger  population  of  such  factors.  However,  one  might 
argue  from  a scientific  point  of  view  that  many  additional  lanes , 
repetitions,  and  firing  positions  could  have  been  tested.  That  is, 
we  happen  to  have  chosen  only  three  repetitions,  two  lanes  per  subject, 
and  eight  different  distance-position  combinations.  Thus,  the  sponsor- 
practitioner  wiBhes  information  that  is  specific  to  his  particular  test. 

In  contrast,  the  scientific  ’’purist"  may  perceive  this  one  test  or 
Goqjariment  as  merely  one  of  mapy  different  kinds  which  could  have  been 
conducted  by  him  for  the  sponsor.  Hence,  the  choice  of  model  indeed 
Influences  the  significance  levels  obtained. 

Tin  power  of  the  F-ratio  to  reject  a false  null  hypothesis  is  a 
function  of  (1)  the  "true"  strength  of  the  particular  factor,  and  (2) 
the  ei.uple  size.  Although  a large  sample  size  may  help  to  detect  a weak 
signal  in  a noisy  background,  the  result  of  using  such  a large  sample 
can  lead  to  increasingly  significant  F-ratios,  with  little,  if  any 
concomittant  Increase  in  the  m.e.  It  is  to  this  latter  summary  statistic 
that  we  now  turn  our  attention,  in  the  analysis  of  the  same  set  of 
marksmanship  data. 

The  m.e.  results  are  shown  In  Table  6,  where  it  may  be  seen  that  the 
largest  effect,  other  than  random  error,  was  due  to  the  "Tables" 
factor,  which  captured  a 23%  share  of  the  total  score  variability. 
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The  effect  due  to  Persons,  reflecting  individual  differences  among  the 
students,  reached  nearly  10%,  Several  interaction  terras,  in  which 
Tables  was  a factor,  accounted  for  about  6%  to  7%. 

TABLE  6,  Changes  in  Magnitude  of  Effect  Index  as  a Function  of  ANOVA  Model. 


Proportion  of  Total  Variance,  Assuming: 

Source 

B, 

A Random, 
,C,D  Fixed 

A,B,D  Random, 

C Fixed 

A,B,C,D  Random 

A (Subjects) 

.0852 

.1027 

.1030 

B (Lanes) 

.0004 

.0006 

.0005 

C (Tables) 

.1643 

.2454 

.2631 

D (Repetitions) 

.0027 

.0041 

.0042 

Note  that  the  effect  due  to  Repetitions  in  Table  5 was  statistically 
significant,  whereas  according  to  Table  6,  Repetitions  contributed  an 
effect  worth  only  about  ,4%.  The  reason  for  this  apparent  discrepancy 
between  the  two  suraroaxy  statistics  is  due  to  the  large  number  of  subjects, 
which  in  turn  produced  a large  number  of  degrees  of  freedom.  This  allows 
small  F-ratios  to  more  readily  achieve  statistical  significance.  Thus,  the 
values  for  ra.e.  in  Table  6 act  as  a check  upon  the  significance  levels 
listed  in  Table  5.  Therefore,  the  effect  due  to  Repetitions  reveals  a 
slight,  but  probably  inconsequential  learning  effect.  A similar  line  of 
reasoning  holds  for  the  interpretation  of  the  Scores  variable  in  Tables 
5 and  6. 

Summary  and  Conclusions 

In  actual  experimental  testing  situations,  it  may  not  be  easy  to 
determine  whether  a given  treatment  should  be  classified  as  a fixed  or  as 
a random  effect.  For  example,  in  the  experiment  outlined,  the  Scores, 
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Repetitions,  and  Tables  factors  could  be  considered  as  either  fixed 
or  as  random.  Recall  that  Tables  had  eight  levels,  representing  the 
eight  specific  position-distance  combinations  that  comprise  the 
marksmanship  test.  Since  there  are  theoretically  an  infinite  number 
of  distance -position  combinations,  Tables  could  be  interpreted  as  a 
sampling  of  eight  from  this  much  larger  population.  Since  an  experimenter 
is  often  interested  in  generalizing  his  results  beyond  the  specific 
treatment  levels  t'o  a larger  set  of  ’'real-world”  circumstances,  a random 
effects  assignment  to  Tables  could  easily  be  justified.  Furthermore, 
the  -probability  of  falsely  rejecting  a true  null  hypothesis  is  less  when 
a treatment  is  considered  to  be  random  as  opposed  to  fixed. 

In  summary,  the  wise  use  of  an  ANOVA  model  involves  the  following 
points:  (1)  determination  of  fixed  vs.  random  factors,  (2)coraputation  of 
complete  sets  of  summary  statistics,  (3)  interpretation  of  the  statistics. 
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EXl'ER  1 MENTAL  DESIGN  FOR  TESTING  EFFECT  OF  INGESTING 
CRUDE  FIBER  ON  PLASMA  ZINC  LEVELS  IN  HUMAN  VOLUNTEERS 

Walter  D.  Foster,  AFIP,  and  Barbara  F.  Harland,  FDA 
Washington,  D.C.  20306 

ABSTRACT . The  benefits  of  ingesting  dietary  fiber  may  be  off: 
by  a possible  depression  of  plasrr.a  zinc  levels.  An  experiment  was 
designed  to  detect  a less  of  lOug/lOOml  in  plasma  zinc  (if  it  ex- 
isted) at  the  .?1  significance  level  with  a power  of  .95*  Variance 
estimates  were  deduced  from  serum  (not  plasma)  distributions  in  trie 
literature  and  restructured  to  offer  between  (and  within)  subject 
variance  components.  According  to  the  non-central  F-distribut icn , 
these  design  parameters  required. 1^  volunteers  to  finish  the  experi- 
ment, each  with  three  plasma  determinations  before  treatment  and 
three  more  at  the  end.  Treatment  consisted  of  daily  ingestion 
of  bran  muffins  and  bread  containing  2.7  grams  of  crude  fiber  for  a 
period  of  1 4 weeks.  A similar  group  of  controls  ingested  this  diet 
without  added  fiber. 

I.  INTRODUCTION  AND  OBJECTIVES.  For  at  least  20  years,  the 


scientific  literature 
accrue  from  the  ingest 
crude  fiber's  potentia 
atherosclerosis.  The 
ated  this  theme.  Thus 
is  actively  altering  d 
facturers  of  bread  and 
programs  to  sell  newly 


has  noted  the  general  health  benefits  that  mi gh' 
ion  of  crude  fiber,  with  specific  emphasis  on 
1 for  reducing  the  incidence  and  severity  of 
popular  literature  of  recent  years  has  reiter- 
, a growing  proportion  of  the  reading  public 
iets  to  include  more  crude  fiber.  The  manu- 
breakfast  foods  have  instituted  advertising 
developed,  high  fiber  products. 


V.nat  is  not  well  known  is  the  possibility  of  detrimental  ef- 
fects from  increased  crude  fiber,  specifically  the  excretion  of  zinc 
and  other  minerals  from  the  body.  This  problem  has  been  acknowledge: 
in  the  medical  literature  only  recently  and  has  been  slow  to  reach 
the  popular  literature  and  the  advertising  media. 

The  Food  and  Drug  Administration  bears  the  responsibility 
for  monitoring  (and  regulating,  if  necessary,)  the  production  and 
sale  of  food.  To  augment  the  information  currently  available,  FDA 
asked  for  experimentation  specifically  designed  to  measure  the  de- 
crease (if  any)  in  zinc  and  other  minerals  in  blood  plasma  as  a re- 
sult of  the  daily  ingestion  of  2.7  grams  of  crude  fiber  in  addition 
to  self-selected  diet. 

It  is  the  objective  of  this  report  to  describe  in  detail  the 
design  and  suggested  analysis  for  this  experiment  and  to  document 
she  experimental  protocol  selected. 

I he  hypothetical  time  trend  shown  in  Figure  1 formed  the 
oasis  sf  the  plan.  Measurements  of  plasma  levels  were  to  be  ob- 
tained efore^treatm.er.t . Treatment  was  defined  to  be  the  daily  in- 
gestion of  2.7  grams  of  crude  fiber  derived  from  unprocessed  bran, 
i recur : rated  into  muffins,  date  bread,  sr.d  orewnies.  After  a tra.t- 
i ‘ 1 : n ; . :T  : d of  several  weeks  :c  allow  serum  levels  tc  reach  a new 
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equilibrium,  further  plasma  determinations  were  to  be  obtained. 


PRE- 

TREAT- 

MENT 


TRANSITION 


EQUILIB- 

RIUM 


'Control 
* Fiber 


FIGURE  1.  Hypothetical  time  trend  of  plasma  zinc 

The  specific  questions  were:  1.  duration  of  transition  period; 

2.  number  of  subjects  in  the  treatment  group;  3*  number  of  sub- 
jects in  the  control  group  on  the  same  regimen  but  without  bran; 
and  4 . number  of  plasma  measurements  in  the  pre-treatment  and 
equilibrium  periods. 

II.  ESTIMATION  OF  SAMPLE  SIZES.  Neither  our  own  experience 
nor  the  literature  was  helpful  in  answering  objective  #1 : length 
of  transition  period.  Our  solution  was  arbitrary — 12  weeks,  a 
most  conservative  estimate  to  allow  for  complete  transition.  Two 
weeks  were  allotted  for  the  pre-treatment  baseline  testing;  two 
weeks  were  added  for  the  equilibrium  period,  making  a total  of  16. 

Objectives  2— h p how  many  subjects  and  how  many  periods,  were 
approached  simultaneously.  The  paradigm  below  shows  the  detailed 
experimental  design  and  suggested  analysis  of  variance,  but  does 
not  specify  how  many  subjects  and  how  many  periods. 


Pre- 

Treat- 

ment 


Equilib- 
rium • 


Treat- 

ment 

Group 

Con- 

trol 

Group 


A.  V. 

GROUPS 

DIETS 

GxD 

SUBJECTS  IN  G 
SxD 

PERIODS  IN  D 

QxP 

SxP 


It  was  convenient  to  consider  the  treatment  group  alone  as  an  approach  to 


suggesting  the  number 

D I 

of  subjects  and  periods. 

E T S 

Pre- 

Treat- 

ment 

Equilib 

rium 

A.  V.  EXPECTED  MEAN  SQUARE 

Treat-  A - - - 

SUBJECTS  + dp al 

bJr  b 

ment  B - - - 

- - - DIETS  dgp  + so2  + P0gp  + pa 0* 

Group 

aS  »sp  + »°E 

• 

PERIODS  IN  D cf gp  + bo* 

3 

, SxP  <3* p 

The  problem  was  to  secure  estimates  of  those  variance  components  to  be  used 
to  test  the  effect  of  Diets  and  to  determine  s and  p.  Design  criteria  were 
defined  as  follows : require  that  a difference  in  plasma  level  due  to  diet  of 
as  much  as  10  ug/lOOml  be  statistically  significant  at  the  .01  level  with  the 
power  of  the  test  set  at  .95>  In  terms  of^the  non-central  F-distribution,  we 

have  2(1^  - u)2/k 

Non-Central  F : ®2  - iMs7"3^ple"s"iTe 

We  set  k * 2 

$2  * 32  for  a = .01,  3 ■ *05.  6 ■ 10 


or,  solving  for  p, 


P 


02p/s  ♦ 

6"V2 W—JJT 


(1) 
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Measurements  of  serum  (not  plasma)  levels  repeated  in  time 
for  subjects  on  a steady-state  but  self-selected  diet  were  available 
from  Pekarek  (72) j but  not  in  analysis  of  variance  format.  An  ap- 
proximate reconstruction  of  Pekarek's  data  in  AV  form  is  shown  below. 

d.f . MS  EMS 

Subjects  98  1HJ4  02  + po3 

Periods  in  S 728  81 

sp  p 

If  we  assume  that  p * 827/99  • 8.35,  then  + cr|  .»  81  and 
po§  - "1753.  These  estimates  were  not  out  of  line  with  those 

reconstructed  similarly  from  other  investigators,  Davies  (69), 

Pecoud  (75),  Halstead  (74),  and  Nichols  (76).  However,  there  was 
a problem  in  changing  scale  from  serum  values  to  the  expected 
equivalent  in  plasma  levels. 

A currently  used  conversion  from  serum  to  plasma  means  is  a 
simple  percentage  drop:  AX  ■ plasma  - serum  ■ Xp  - Xq 

- Xs/  1.16  - Xs  - - . l4Xg? 

A plot  of  s(X)  vs  Y using  both  serum  and  plasma  reports  revealed 
the  consistent  relation:  as  ■ aX/5  so  that 

As  - -.0275XS  “ -2.8  for  typical  serum 

levels  of, 100  ug/lOOml.  In  terms  of  variances,  the  estimates  beoome 
ogp  + o|  m 38.4  (Plasma).  Equation  (1)  requires  estimate^  of  , 
a|D  , and  o|  ; thus  far,  the  literature  has  yielded  only  °ip  + °p  A 
Table  1 contains  values  of  s and  p for  a variety  of  relationships 
between  _°SP  • °SD  , and  a|  in  an  effort  to  "box  in"  a portion  of 
hypersurface  represented  with  the  hope  that  impracticable  values 
of  s and  p would  be  accompanied  by  unlikely  values  of  the 
variances.  Clearly  a considerable  degree  of  guessing  was  involved 
when  the  values  of  s ■ 14  and  p ■ 3 were  chosen  from  the  center 
of  Table  1.  Thus,  14  subjects  who  would  finish  the  experiment  was 
a minimum  requirement.  A similar  number  was  recommended  for  the 
control  group  with  the  emphasis  on  a greater  number  in  the  treatment 
group  if  absolute  balance  was  not  possible. 

III.  ALLOCATION  OF  SUBJECTS.  Allocation  of  the  3^  persons 
who  answered  the  request  for  volunteers  was  based  on  a balance  of 
height,  weight,  sex,  level  of  physical  activity,  and  a measure  of 
body  fat.  The  physical  factors  were  combined  to  give  an  index  num- 
ber Y as  follows  : 

Y ■ 2 ( t - T)  + (w  - W)/3,  where 

t - triceps  skinfold,  mm;  T ■ median  skinfold  for  that  age, sex; 

w = weight,  pounds;  W ■ median  weight  for  height, sex,  and  frame 
from  the  Metropolitan  Life  tables. 

The  index  numbers  Y were  found  to  be  reasonably  related  to  a some- 
what similar  index  constructed  by  Lamphier  in  Nichols(76).  After 
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TABLE  1.  Values  of  s,  p to  meet  design  criteria  for  various  component 
ratios  under  the  restraint ,o£  + a*  = 38. 4 
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ranking  the  subjeots  by  their  index  number  Y and  according  to  their 
level  of  physical  activity,  adjacent  subjects  were  allotted  to 
groups  at  random.  Neither  the  subjects  nor  the  technicians  who 
made  the  plasma  determinations  knew  the  group  allocations;  every 
precaution  possible  was  employed  to  make  it  truly  a blind  experi- 
ment . 


REFERENCES 


Determination  of  serum  zinc  concentrations  in  normal  adult  subjects 
by  atomic  absorption  spectrophotometry.  Robert  S.  Pekarek,  William 
Bisel,  Peter  Bartelloni,  Karen  Bostian.  Am  J Clin  Path  57:506.1972 

Measurements  of  plasma  zinc.  I.  J.  T.  Davies,  M.  Musa,  and  T.L. 
Dormandy.  J Clin  Path  21:359.  1969 

Effect  of  foodstuffs  on  the  absorption  of  zinc  sulfate.  A.  Pecoud, 
P.  Donzel,  & J.  L Schelling.  Clin  Pharmacol  & Therapeutics : 17 : 

469.  1975 

A conspectus  of  research  on  zino  requirements  of  man.  J.  A.  Hal- 
stead, J.  C.  Smith,  Jr.,  & M.  I.  Irwin.  J of  Nutrition  104:03, 

345  Maroh  1974. 

Independence  of  serum  lipid  levels  and  dietary  habits:  The  Tecumseh 
Study.  Allen  B.  Nichols,  Catherine  Ravenscroft;  Donald  E.  Lam- 
phiear;  Leon  D.  Ostrander.  JAMA  236:  017,  1948.  Oct  1976. 

Analysis  of  variance.  Henry  Scheffe.  John  Wiley  & Sons.  New  York 
1959. 


FIELD  VERIFICATION  OF  RADIATION  CHARACTERISTICS 

OF  RADARS 
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Aeroballistics  Analysis  Branch 
Aeroballistics  Directorate 
Technology  Laboratory 

US  Army  Missile  Research  and  Development  Command 
Redstone  Arsenal,  Alabama 

ABSTRACT.  This  paper  deals  specifically  with  work  done 
to  determine  from  field  test  data,  the  radiation  patterns  of 
the  radars  of  the  Improved  HAWK  system.  It  does  not  attempt 
to  treat  the  subject  in  general.  The  problem  of  data  analysis 
is  the  underlying  subject  of  this  paper.  Many  problems  wore 
encountered  when  doing  the  analysis  which  would  yield  a radia- 
tion pattern.  These  are  discussed.  Some  results  are  presented 
and  conclusions  are  drawn.  The  conclusions  deal  with  measures 
which  will  make  the  Job  of  data  analysis  easier  and  quicker, 
and  should  apply  generally. 


I.  INTRODUCTION.  In  1975,  from  Juiy  to  November,  field 
tests  were  conducted  with  the  radars  of  the  Improved  HAWK  system. 
The  tests  were  conducted  at  Naval  Weapons  Center  (NWC),  China 
Lake,  CA.  The  tests  were  motivated  by  the  Anti-Radiation 
Missile  prbblem  (ARM).  The  primary  objectives  and  findings  of 
the  tests  are  not  the  subject  of  this  paper.  During  the  tests, 
data  was  collected  from  which  the  transmit  patterns  of  the  pri- 
mary radars  could  be  determined.  Pattern  data  had  been  made 
available  by  the  system  prime  contractor.  This  was  data  taken 
on  a radar  range,  in  a receive  rather  than  transmit  mode,  and 
in  a free  space  environment,  to  whatever  extent  this  latter 
was  achieveable.  It  was  felt  that  the  data  taken  under  field 
test  conditions  should  be  processed  to  yield  the  patterns  of 
the  antennas  in  a transmit  mode,  in  a natural  environment  (if 
China  Lake  can  be  judged  natural),  with  multipath  present.  It 
was  also  felt  that  the  data  could  be  processed  in  such  a way 
that  it  would  provide  a check  point  for  a multipath  model  whioh 
had  been  developed.  For  these  reasons  an  effort  was  started  to 
develop  the  radar  patterns  from  the  data  whioh  had  been  collected. 


II.  DATA  COLLECTED.  Figure  1 shows  the  geometry  of  the 
tests  and  test  set-up.  An  RF  sensor  was  mounted  in  the  gondola 
of  a hot  air  balloon.  The  balloon  was  then  permitted  to  rise  to 
various  altitudes  and  as  the  radar  of  interest  was  allowed  to 
rotate  with  its  main  beam  at  a fixed  elevation,  the  output  of 
the  RF  sensor  was  recorded.  Thus,  the  geometry  of  the  radar 
relative  to  the  balloon  borne  RF  sensor  was  widely  variable, 
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0 to  360  degrees  of  azimuth  and  from  almost  zero  elevation  up 
to  about  60  degrees  (the  mechanical  limit).  Also  mounted  in 
the  gondola  of  the  ballon,  and  boresighted  with  the  RF  sensor, 
was  an  IR  seeker,  a television  camera,  and  a riflescope.  The 
riflescope  allowed  the  operator  to  point  the  seeker  cluster 
toward  the  radar.  The  TV  oamera  provided  a visual  record  of 
where  the  seeker  cluster  was  pointed.  The  IR  seeker  provided 
a quantitative  history  of  where  the  seeker  cluster  had  been 
pointed  because  an  IR  source  was  provided  at  the  radar  and 
the  seeker  was  gimballed  and  free  to  track  the  IR  source.  The 
IR  seeker  gimbal  angles  provide  a reoord  of  the  pointing  error 
of  the  seeker  cluster. 

The  quantities  which  were  recorded  are  the  intensity  output, 
the  two  (right-left  and  up-down)  direction  finding  outputs,  and 
a status  indicator  from  the  RF  sensor;  the  two  gimbal  angles 
from  the  IR  sensor;  and  north  marks  from  the  radars.  Also,  the 
geometric  data  to  relate  the  balloon  position  to  the  radar 
position  was  recorded.  Thi3  was  named  the  "Call  Out”  data 
because  of  the  way  it  was  collected  and  recorded.  A person  was 
stationed  with  a sextant  and  he  kept  sighting  on  the  balloon 
and  calling-out  the  balloon  azimuth  and  elevation.  Someone 
would  write  it  down  in  the  log  with  time  of  occurrence.  The 
balloon  operator  would  observe  range  lines  painted  on  the  ground 
and  call  out  the  range  and  someone  would  write  it  down.  There 
was  also  data  from  an  altimeter  to  be  called  out  and  written  down. 
This  handwritten  log  was  the  only  source  of  the"Call  Out"  data. 

The  other  data  was  recorded  on  FM  tape,  copies  of  whloh  were 
furnished  to  MICOM  for  use  in  data  analysis.  Copies  of  the  log 
were  also  furnished.  Some  of  the  FM  tapes  were  digitized  and 
copies  of  these  were  furnished  to  MICOM. 

III.  ANALYSIS  APPROACH.  The  problem  with  analysis  was  not 
so  much  a problem"" of  approach  as  of  retreat.  As  soon  as  some 
of  the  digitized  tapes  were  available  at  MICOM,  people  began  to 
be  solicted  to  "do  something"  with  the  data.  One  young  man 
started  to  do  something  with  the  data  and  found  that  some  of  the 
digital  tapes  oould  not  be  read  at  all,  the  rest  were  digitized 
at  only  20  samples  per  second,  and  that  there  were  chronic  tape 
reading  problems  with  the  computer  system  which  he  had  chosen  to 
use.  Being  a very  capable  and  many  faceted  individual,  he  quickly 
found  something  else  to  do  and  has  been  busy  ever  since.  So  it 
went,  for  about  a year.  Then  the  author  was  solicited  to  "do 
something"  with  the  data,  and  got  stuck  with  it.  To  abbreviate 
the  story,  the  data  tapes  were  digitized  by  the  Test  and  Evalua- 
tion Directorate  of  the  Research,  Development  and  Engineering 
Laboratory  of  MICOM.  The  digitization  rate  was  100  times  per 
seoond,  and  tapes  were  generated  which  were  compatible  with  the 
CDC  6600  oomputer  system  which  was  ohosen  for  the  analysis. 
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No  big  problems  have  been  encountered  with  this  part  of  the 
effort,  just  communications. 

Only  carefully  selected  portions  of  the  PM  tapes  have  been 
digitized  because  of  the  .large  amount  of  data  which  exists.  For 
a segment  of  interest,  chosen  with  the  aid  of  the  test  conductors' 
log,  the  digitized  tapes  provide  the  outputs  of  both  the  sensors 
and  the  radar  north  marks,  as  a function  of  time.  The  test  con- 
ductor's log  is  used  to  make  a table  of  balloon  elevation  angle, 
azimuth  angle,  and  range  as  a funotion  of  the  same  time  base. 

These  are  entered  into  a computer  program  which  reads  the  tape, 
and  then  calculates  the  relative  geometry  which  existed  for  every 
time  recorded  on  the  digital  tape.  To  represent  the  radar  in- 
tensity pattern  as  a function  of  the  relative  geometry,  angular 
space  was  divided  into  aells  which  were  1 degree  of  elevation  and 
3 degrees  of  azimuth.  All  samples  occuring  in  a particular  cell 
were  then  averaged  and  a standard  deviation  calculated.  The 
quantities  processed  were  the  intensity  output  (which  indicates 
radar  pattern)  and  the  direction  finding  outputs  of  the  RF  seeker. 
The  latter  provide  information  about  the  multipath  situation. 

The  number  of  samples  which  occurred  in  each  cell  was  also  re- 
corded. For  a particular  radar,  data  from  several  different 
days  of  testing  were  lumped  together  if  the  RF  conditions  were 
the  same. 


IV.  STATUS . The  only  analysis  which  has  yet  been  done  is 
that  Just  described.  No  time  series  approach  or  spectral  analysis 
approaoh  has  been  attempted.  The  most  complete  set  of  results  is 
for  the  low  altitude  search  radar.  Much  lesB  data  was  reoorded 
for  the  illumination  radar.  No  analysis  has  yet  been  done  with 
data  from  the  high  altitude  searoh  radar. 

V.  RESULTS . 

A.  Radiation  Pattern.  Figure  II  shows  a three,  dimen- 
sional plot  of  the  intensity  data  from  the  RF  sensor,  for  tha  low 
altitude  search  radar.  In  this  figure  0.  relative  azimuth  means 
that  the  radar  beam  1b  pointed  in  azimuth  toward  the  balloon. 
Negative  relative  azimuth  is  to  the  right.  The  elevation  is 
balloon  elevation  angle  above  the  radar  beam.  Note  that  the 
intensity  scale  is  not  provided  here.  It  can  be  seen  that  the 
most  power  is  with  the  main  beam  pointed  toward  the  balloon  and 
that  power  decreases  with  balloon  elevation.  Data  for  main  beam 
on  the  balloon  is  not  shown  here  and  was  not  taken  in  this  test. 
Other  places  where  no  data  is  shown  are  at  high  balloon  elevations 
where  none  wbb  recorded,  and  at  a few  orientations  where  there 
was  insufficient  received  power  at  the  RF  sensor.  Figure  III 
shows  a representation  of  data  from  the  contractors  tests.  The 
intensity  scale  is  again  unspecified,  and  is  different  from 
previous  figure.  The  thing  which  seems  worthy  of  note  here  is 
that  the  intensity  levels  in  some  regions  are  approximately  the 
same,  but  the  patterns  measured  by  the  contractor  show  much 
steeper  gradients.  Indeed,  the  plot  is  full  of  spikes.  There 


137 


is  higher  intensity  in  a quite  narrow  region  at  zero  relative 
azimuth  for  all  elevation  angles  shown.  Within  about  ten  degrees 
to  either  side  of  this  region  the  intensity  drops  abruptly  down 
to  a region  which  is  approximately  180  degrees  total  width.  In 
this  region,  the  intensity  spikes  seem  randomly  scattered  and 
their  height  decreases  roughly  linearly  as  the  edge  of  the  region 
is  approached.  Another  striking  difference  between  the  two  plots 
is  the  shape  variation  with  azimuth  at  a particular  elevation.  The 
field  test  data  is  high  at  zero  azimuth,  drops  for  a few  degrees 
to  each  side  of  zero,  then  rises  again  and  drops  again.  Some 
behavior  of  this  nature  can  be  seen  at  all  elevation  angles.  The 
contractors  data  shows  this  sort  of  tendency  only  at  approximately 
40  degrees  elevation  angle,  and  in  regions  approximately  90°  to 
either  side  of  zero  azimuth.  Still  another  difference  is  that 
the  field  test  data  shows  intensity  to  decrease  consistently  with 
balloon  elevation,  but  the  contractors  data  does  not  change  muoh 
with  elevation  angle,  exoept  at  the  40  degrees  elevation  angle 
Just  discussed.  The  contractors  data  was  based  upon  a single 
set  of  measurements  and  no  averaging  was  done.  Consultation 
with  people  who  are  experts  in  the  field  has  revealed  that  there 
may  be  a good  deal  of  randomness  in  the  structure  of  a radiation 
pattern  determined  from  a single  set  of  measurements.  In  other 
wordB,  if  the  measurement  set  were  to  be  repeated  by  the  contractor, 
the  radiation  pattern  would  not  be  duplicated,  but  should  have 
the  same  general  characteristics.  If  several  measurement  sets  were 
averaged  together,  then  the  resulting  pattern  should  be  muoh  more 
similar  to  the  pattern  determined  by  averaging  field  test  results, 
as  I have  done.  This  argument  would  lead  to  a conclusion  that  the 
examination  of  the  field  test  data  on  a scan  by  soan  baBis  should 
reveal  a rapidly  changing  intensity  history  as  the  various  radia- 
tion spikes  are  oriented  toward  the  balloon.  The  field  test  data 
has  been  inspected  on  a scan  by  soan  baBis  and  the  intensity 
variation  within  a scan  does  not  appear  to  be  of  this  rapidly 
changing  nature.  In  fact,  many  of  the  soans  have  the  same 
characteristics  as  the  plot  of  the  averaged  data.  Figure  IV 
shows  three  scans  of  this  data.  It  is  thought  that  the  data  re- 
cording process  (the  RF  sensor,  telemetry  process,  and  stripchart 
recorder)  do  not  introduce  enough  filtering  to  prevent  response 
to  intensity  spikes.  But  effort  is  being  put  forth  to  determine 
whether  or  not  this  is  true. 

B.  Multipath  Model.  The  multipath  model  validation  effort 
will  now  be  disoussed.  The  model  hypothesizes  that  multipath  is 
produced  by  a diffuse  type  of  reflection  of  the  main  beam  radia- 
tion of  this  radar.  For  some  radars  it  might  be  necessary  to 
include  other  high  level  lobes  also.  It  must  be  emphasized  that 
diffuse  rather  than  speoular  radiation  is  assumed.  The  main  lobe 
is  assumed  to  "paint"  a swath  of  ground  as  illustrated  in  Figure  V. 
This  area  then  becomes  a distributed  radiator.  The  model  calculates 
the  area  and  centroid  of  the  swath  and  using  empirically  derived 
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data  taken  at  NWC  in  a previous  teBt,  calculates  the  power 
radiated  from  the  swath  of  ground.  Assuming  this  power 
effectively  originates  from  the  area  centroid  of  the  Bwath, 
the  centroid  of  the  direct  path  and  the  multipath  radiation 
combined  can  be  calculated.  Obviously  this  centroid  will  be 
at  some  point  displaced  from  the  radar  on  a line  toward  the 
centroid  of  the  swath  of  area.  The  model  would  then  predict 
that  the  sensed  emitter  location  would  revolve  around  the 
actual  radar  location  at  the  rotation  rate  of  the  radar.  If 
the  RF  sensors  were  directly  overhead,  the  sensed  emitter 
location  would  be  on  a circle  and  the  azimuth  and  elevation 
components  of  the  angular  error  would  be  equal.  In  the 
general  case  the  elevation  angular  error  is  smaller  because 
the  circle  appears  to  be  elliptical  when  viewed  at  an  angle. 
This  multipath  error  model  has  not  been  extensively  validated. 
One  objective  of  analysis  of  the  field  test  data  is  to  vali- 
date the  model,  or  to  discover  its  short  comingB.  Figure  VI 
shows  idealized  error  plots  for  this  low  altitude  searoh 
radar,  at  a particular  balloon  elevation.  When  the  radar 
main  beam  is  90  degrees  to  the  right  of  the  balloon,  the 
azimuth  channel  error  should  be  a maximum  value  and  to  the 
right,  while  the  elevation  channel  error  should  be  zero. 

When  the  main  beam  is  pointed  toward  the  balloon  (or  away 
from  it),  the  elevation  channel  error  should  be  a maximum, 
and  down  (or  up)  and  the  azimuth  channel  error  should  be 
zero.  Figure  VII  shows  a three  dimensional  plot  of  the 
azimuth  channel  error  from  the  field  test.  It  is  to  be 
noted  that  at  a particular  elevation  angle  the  error  behaves 
in  the  same  manner  as  the  idealized  error  of  Figure  VI.  Figure 
VIII  shows  the  elevation  channel  error,  where  again  the  be- 
havior is  as  the  model  would  prediot,  In  a qualitative  sense. 

The  preceding  figures  have  demonstrated  that  multipath 
seems  to  originate  by  diffuse  scattering  of  radiation  from 
the  radar  main  beam  on  the  ground,  because  this  assumption 
seems  to  describe  what  was  observed  in  the  field  test.  The 
comparison  is  qualitative,  however.  The  multipath  model  has 
not  been  exercised  to  see  to  what  degree  it  will  reproduce 
the  field  test  results.  To  do  this,  a good  representation 
of  the  radar  pattern  is  needed.  At  this  point  it  is  not 
clear  what  to  use.  The  pattern  from  the  field  test  data 
oontains  an  intensity  contribution  from  the  multipath,  and 
there  are  no  measurements  which  are  free  of  multipath,  except 
the  contractors  measurement.  These  look  a good  bit  different 
from  the  pattern  derived  from  the  field  test,  and  it  is  felt 
that  the  difference  cannot  be  attributed  to  the  multipath 
power  alone.  Also,  these  would  be  very  difficult  to  represent. 


139 


The  next  step  toward  validation  of  this  model  1b  likely 
to  be  the  calculation  of  the  multipath  intensity  contribution 
for  each  angular  cell  where  field  test  data  was  collected, 
using  the  model  as  is.  This  intensity  can  then  be  subtracted 
from  the  intensity  measured  in  this  field  test  and  the  differ- 
ence taken  as  the  radar  contribution.  The  multipath  model 
can  then  be  used  to  produce  error  data  for  all  geometries 
of  the  field  test,  and  this  oompared  to  the  error  data  from 
the  field  test.  An  iterative  process  could  be  used  to  refine 
the  model. 

V.  PROBLEMS . There  exists  the  problem  of  the' radiation 
pattern  being  different  from  that  measured  by  the  contractor. 

On  one  hand,  there  is  the  opinion  that  if  the  contractors 
facility  does  not  yield  the  same  results  as  field  tests, 
then  it's  no  good  at  all.  The  other  extreme  of  opinion  is 
that  the  agreement  is  as  close  as  should  be  expected. 

The  number  of  samples  which  have  been  averaged  to  find 
mean  intensity,  and  mean  angular  error  components,  is  variable. 
Near  the  lower  and  upper  extremes  of  balloon  elevation,  fewer 
samples  were  taken.  This  contributes  to  the  raggedness  of 
the  estimates  in  those  regions.  In  regions  where  the  received 
intensity  was  low  there  are  also  fewer  samples.  In  this  oase, 
there  is  a double  contribution  to  the  raggedness  of  the  esti- 
mates because  the  RP  sensor  noiBe  becomes  more  important  at 
low  signal.  But,  all  the  available  data  has  been  used. 

VII.  CONCLUSIONS . 

A.  A data  reduotion/analyBis  plan  should  be 
prepared  prior  to  the  test. 

B.  Where  exchange  of  magnetic  tapes  is  contemplated, 
it  would  be  very  good  to  verify  compatibility  with  a pre-test 
sample . 

C.  The  person  or  persons  who  will  ultimately  end 
up  doing  the  analysis  should  be  Intimately  Involved  in  teat 
planning,  determination  of  data  requirements,  and  perhaps  the 
oonduct  of  the  test.  At  a minimum,  he  should  observe  some 
typical  portions  of  testing. 
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CONSTRUCTION  OF  CONFIDENCE  LIMITS 
IN  A NONLINEAR  REGRESSION 


C.  MAXSON  GREENLAND 
LYNN  H.  DAVIS 
SYSTEMS  ASSESSMENT  OFFICE 
Chemical  Systems  Laboratory 
Aberdeen  Proving  Ground,  Maryland 

ABSTRACT.  This  problem  was  presented  in  a clinical  session  at  the 
Twenty-Third  Conference  on  the  Design  of  Experiments.  It  arises  from 
the  need  to  assess  the  uncertainties  associated  with  calibration  curves 
which  have  been  fitted  to  observed  data.  The  discussion  includes  a 
particular  nonlinear  model  for  the  curve,  the  regression  procedures, 
and  several  attempted  methods  for  calculating  100(l-a)%  confidence  limits 
for  the  curve.  A detailed  description  is  given  of  an  approach  outlined 
by  panel  members  to  whom  the  problem  was  presented.  Finally,  a complete 
example  is  given,  including  graphical  representation  of  a portion  of  a t 

100(l-o)S  confidence  region  In  the  parameter  space,  and  a description  of 
the  computer  work  necessary  to  obtain  numerical  results. 

I.  BACKGROUND.  Sensitive  electronic  analyzers  which  are  now  In  use 
are  capable  of  measuring  very  low  concentrations  (on  the  order  of  1-15 
nanograms  per  milliliter)  of  chemical  substances  in  solution.  The 
uncertainties  inherent  In  the  development  of  calibration  curves  for  this  j 

type  of  equipment  assume  great  importance  in  quantitative  analyses  of  j 

highly  toxic  materials.  At  a given  significance  level,  a,  a properly-  j 

■ constructed  confidence  band  for  a calibration  curve  Is  the  basis  for 

obtaining  Interval  estimates  of  concentration  x (the  independent  variable) 
for  an  observed  value  y (the  dependent  variable)  of  the  analyzer  output. 

An  Interval  of  particular  In’erest  is  determined  by  the  intersection  of 
£ the  upper  confidence  limit  curve  and  the  Y-axis.  This  point,  yc»  is 

called  the  decision  limit  since  an  observed  Instrument  response  of  less 
i than  or  equal  magnitude  has  a non-negligible  probability  of  having  been 

produced  by  a zero  X-value.  The  X-value,  Xp,  corresponding  to  yc  and 
determined  by  the  lower  confidence  limit  curve  is  called  the  detection 
limit,  the  lowest  value  of  X which  can  be  distinguished  from  zero. 

Hence,  for  concentration  measurements  at  a significance  level  of  a,  Xp 
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Is  the  lowest  concentration  which  can  be  detected,  and  yc  Is  the  lowest 
reading  which  distinguishes  between  the  presence  or  absence  of  a 
chemical  substance.  These  relationships,  which  have  been  discussed  by 
Hubaux  and  Vos  (ref.  1),  are  Illustrated  for  a hypothetical  nonlinear 
curve  In  Figure  1. 


Figure  1.  Calibration  curve  and  confidence  band;  minimum  (X,), 
maximum  (Xy)  and  regression  value  ($)  corresponding  to  chart1, 
reading  (Y; ; decision  limit  (yc)  and  detection  limit  (xD) . 

II.  REGRESSION  PROCEDURES.  The  calibration  data  for  analyzer 
Instruments  used  In  a recent  Chemical  Systems  Laboratory  study  demonstrated 
a configuration  similar  to  Figure  1,  where  the  abscissa  represents  the 
concentration  In  nanograms/mll 111 iter  (1  ng  ■ 1D“9  grams),  and  the 
ordinate  represents  observed  chart -readings. 

Because  of  time  constraints,  the  first  calibration  curves  were 
developed  by  means  of  linear  Interpolation  between  points.  Later,  when 
more  time  became  available,  a model  of  the  form  y ■ a + blnx  was  examined; 
It  had  the  approximate  configuration  of  the  data  plot  and  was  linear  In 
the  parameters  a and  b,  but  y decreases  without  bound  as  x approaches  zero. 
In  order  to  translate  the  curve  to  the  left  so  that  the  point  (0,10)  falls 
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reasonably  close  to  the  curve,  the  following  modified  equation  was  tried: 

y « a + bln(x+k) 

The  unknown  parameters  were  obtained  as  follows: 

1)  The  parameter  k was  estimated;  then  z«ln(x+k)  transformed  the 
model  Into  yaa+bz,  which  Is  linear  In  a and  b. 

2)  The  three  parameters  (p-3)  of  the  regression  curve  were 
determined  by  the  method  of  least  squares. 

3)  The  value  of  k was  varied  In  Increments  of  0.1,  and  new  fits  were 
calculated  by  means  of  an  HP25  handheld  electronic  calculator  until  a 
maximum  value  of  the  correlation  coefficient  was  obtained. 

4)  The  equation  which  produced  the  greatest  value  for  the  correlation 

% i 

coefficient  was  the  model’ selected  for  the  calibration  curve. 

A representative  example  of  eight  data  pairs  (n-8)  resulted  In  the 
following  regression  equation: 

y » -38.405  + 41.167  In  (x+3. 2) 

and  a correlation  coefficient  of  0.9996.  The  standard  error  of  the 
estimate  Is: 

SE  ' -nV/I(V*'b1nZ1)2 
- 0.9131 

Then,  following  a procedure  described  by  Natrella  (ref.  2),  100{l-o)X 
confidence  Intervals  were  calculated  for  the  Inverse  function 

X ■ exp  [jj-  (7'-a)J-k 

where 7'  Is  the  average  of  n'  chart  readings.  The  equation  for  the 
Interval,  which  was  also  programed  on  the  HP25,  Is  

(M.)c 


where  C ■ b2  - (t1_cl/2)2  Sfa2 

)T  » mean  of  the  observed  X-values 
7 * mean  of  the  observed  Y-values 

p _r«2 


standard  error  of  the  estimate  of  Y 
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n ■ number  of  calibration  observations 
n'  ■ number  of  new  observations  of  Y 
s^  ■ standard  error  of  the  estimate  of  b 


Although  this  appears  to  be  a somewhat  more  refined  approach  than 
successive  linear  Interpolations,  several  theoretical  objectives  occur: 

1)  There  Is  no  physical  reason  to  assume  an  underlying  logarithmic 
relationship  between  concentration  and  the  electrical  output  of  the 
analyzer. 

2)  The  equation  cannot  be  transformed  to  one  which  Is  linear  In  the 
parameter  k. 

3)  The  size  of  the  Increments  applied  to  k was  arbitrarily  chosen. 

4)  The  correlation  coefficient  Is  a questionable  criterion  of 
selection  of  the  parameter  values. 

These  considerations  led  to  a search  for  Improved  procedures. 

In  this  Instance,  the  analyzer  operates  on  the  principle  of  light 
absorption.  The  Intensity  of  light  transmitted  through  a sample  of  the 
solution  is  Inversely  proportional  to  concentration  and  affects  the 
output  of  a photocell,  which  causes  the  deflection  of  a continuously* 
recording  pen.  The  process  of  radiation  absorption  Is  described  by  the 
Beer-Lambert  Law: 


I0e'k1*,  where 


■ Intensity  of  light  before  transmission 

■ Intensity  of  transmitted  light 

■ absorption  coefficient 

■ length  of  light  path  through  solution 


Assuming  a simple  linear  relationship  between  Intensity  of  transmitted 
light  and  Instrument  reading  leads  to  the  following: 

y ■ « + bl 
■ o + bIQe“klx 
- o + 8yx 

Note:  The  symbol  u for  the  parameter  should  not  be  confused  with  the 
symbol  o for  the  statistical  significance  level. 


152 


At  Chemical  Systems  Laboratory  (CSL)  there  is  available  an  International 
Mathematics  and  Statistics  Library  (IMSL)  subroutine  (ref.  3)  which 
estimates  a,  B and  Y for  this  function,  calculates  the  standard  error 
of  the  estimate  ($E),  and  determines  the  variance-covariance  (VCV) 
matrix  for  a,  $ and  y.  Only  partial  details  of  this  proprietary 
procedure  are  available,  but  an  estimate  of  yis  determined  Iteratively 
to  a specified  accuracy  using  a Fibonacci  technique.  Then  a and  » are 

M„‘he  0f  ,eaSt  squ,res'  »<•  «*!•  ««  run  on 

tho  UNIVAC  .108  computer  et  CSL,  ten  Iteretlon,  of  the  subroutine  o„e 
the  regression  equation 

y - 92.394  - 81.868  (0.88352)* 

and  SE  - 0.3167,  which  Is  approximately  one-third  the  value  of  S 
obtained  for  the  logarithmic  model.  E 

A method  described  by  Snedecor  and  Cochran  (ref.  4),  based  on  a 

Taylor's  series  expension  of  the  function,  Is  particularly  satisfactory 

If  good  Initial  estimates  of  the  parameters  are  available.  Consider  y 
as  a function  of  y:  * 

y ■ f(r)  ■ . * ey\  where  y c [l.i]  ,nd  o<k<l. 

Is  continuous  on  [k,r]  and  differentiable  on  (k,l),  and  If  r,  s 
then  for  each  y t ( k ,1)  1 

f(r)  ■ f(r,)  + (Y.r,)f(r0J,  where  y < r.  < r. 
rl  chos,n  V6rj'  c1ose  “ T.  the  following  approximation  holds: 

f(v)  - f(rj)  + (y-rjJfWj) 


f*.  g 


Therefore, 


y " « + ei-j*  + (y-r^xrj 


where  X, 


1,  Xj  ■ 


" aXo  + exi  + 


x-l 


rl 


xrj*'1,  and  x 


SlY-rj). 


If  the  above 


- * A £ i K \ T 1 1 / • trie  QDOVe 

equation  were  exact,  it  would  be  possible  to  obtain  estimates  fi,  b and  x 
of  the  coefficients  a,  e and  x;  y could  then  be  calculated.  The 
truncation  of  the  Taylor's  series  Introduces  an  error;  hence,  the 
calculated  values  are  estimates  a,  b and  c of  the  estimates  fi,  ft  and  y, 
respectively.  It  Is  then  possible  to  use  procedures  applicable  to 
multiple  linear  regression  to  fit  the  model 

Y - aXQ  + bXj  + cX2 
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where  c ■ b ( rg  - fj) 

If  X Is  the  matrix  of  observed  values  of  the  X^,  l.e. 


rtxl  ... 

I 

ly'\ 

1 

rt*2 

x2x1X2”^ 

,4I 

y2 

• 

• 

• 

. A ■ 

b 

, Y - 

• 

• 

\. 

• 

• 

» 

. j 

Icl 

• 

1 • 

V 

r^n  .. 

i 

'V 

and  X1  Is  the  transpose  of  X,  then  the  matrix  equation  X'XA  ■ X ' Y can 
be  solved  for  A: 

A - (X»X)“l  X'Y 

Then,  If  a,  b and  r2  ■ + £ are  substituted  Into  the  original  equation 

for  a,  0 and  y,  respectively,  S£  can  be  calculated.  The  procedure  Is 
repeated  until  S£  reaches  a minimum  value.  In  this  example  the  final 
equation  Is 

y - 91.263  - 80.916  (.88012)* 

and  Sg  > 0.2581,  a further  reduction  In  the  standard  error  of  about  19X. 
This  two-step  approach  was  used  to  fit  all  the  calibration  curves  In 
the  CSL  study. 

Since  the  calibration  curves  are  used  to  obtain  concentration 
values  from  chart  readings,  the  Inverse  function 

provides  the  necessary  transformation. 

III.  CONFIDENCE  REGIONS 

A.  Preliminary  Calculations 

The  Inverse  matrix  of  Gauss  multipliers • (Cfj)  ■ (X'X)"1,was 
used  to  calculate  the  standard  error  of  each  coefficient: 


l! 

*5. 


se  ’^TT 

sE^J 


0.7185 

0.6528 


SEC 


/ 


'33  . c22 


7c 


23 


^ bL  . by 

7T"7  + Tc- 


° SE  ^33/b  “ °‘00213 

Then  100(l-a)%  confidence  limits  for  each  estimated  parameter  are  given 
by 

a i ta,n»psa 
b - ta,n-psb 
r - ta,n-psr 

A simultaneous  100(l-a)%  confidence  band  Is  required,  l.e.,  a confidence 
band  which  will  contain  the  calibration  curve  100(l-a)%  of  the  time. 

Brelman  (ref.  5)  has  shown  that  Individual  100(l-a)%  confidence  Intervals 
for  k parameters,  form  100(l-ka)%  simultaneous  confidence  Intervals. 

Hence,  a 95%  simultaneous  confidence  region  for  the  three  parameters  a,  0 
and  y represents  98.3%  Individual  Intervals.  Then  tQ  5 ■ 3.5,  and 

a c (88.748,  93.778] 

0 c [-83.201,  -78.631] 

Y c [0.87266,  0.8875® 

By  selecting  combinations  of  the  parametric  values  within  these  Intervals 
which  give  maximum  and  minimum  values  for  y,  a 95%  simultaneous  confidence 
region  for  y was  calculated.  The  procedure  Is  relatively  crude  and 
leads  to  fairly  wide  Intervals.  The  detection  limit  Is  approximately 
1 ng/ml,  and  the  Interval  estimates  become  wider  at  the  higher  concentrations. 
A method  Is  needed  to  calculate  Improved  (more  restricted)  confidence 
regions,  If  possible. 

B.  Suggested  Procedure 

The  following  method  for  determining  a 100(l-o)%  simultaneous 
confidence  band  for  the  calibration  curve  y ■ a + 0y*  developed  from 
suggestions  of  the  panel  members  to  whom  this  problem  was  presented  at 
the  Twenty-Third  Design  of  Experiments  Conference  In  Monterey,  CA.  A new 
model,  y ■ 91.269  - 80. 921(. 83014)*,  based  on  24  calibration  points, 
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fitted  as  previously  described,  Is  Introduced  here. 

In  the  preceding  section  100(l-a)5>,  individual  confidence  Intervals 
were  calculated  for  each  parameter.  A set  of  3 linearly  Independent 
unit  vectors,  A,  b,  r can  be  considered  as  an  orthornormal  basis  for  a 
vector  space  P called  the  parameter  space.  Every  point  of  P can  be 
written  as  a linear  combination  of  A,  B and  r,  l.e.,  as  a 3-tuple  of 
real  numbers  (a,  b,  y)-  All  points  of  P whose  components  lie  within 
the  separate  confidence  Intervals  determine  a rectangular  paralleloplped 
In  P.  An  extension  of  a method  described  by  Draper  and  Smith  (ref.  6) 
permits  the  further  restriction  of  the  points  (a,  3,  y)  to  a subset  which 
represents  an  approximately  lOO(l-a)*  simultaneous  confidence  region  C 
for  the  three  parameters. 

All  points  must  satisfy  the  equation 

s(a,  3,  y)  ■ s(a,  £,  y)  £l  + ^ F (p,  n-p,  l-a)J 

where  S(a,  3,  y)  - l (Y1  - a - ByX1)2 
1*1  1 

and  ( a,  3,  y)  Is  the  point  In  P whose  components  are  the  parameter 
values  of  the  calibration  curve.  ' Hence,  the  right  side*  of  the  equation 
Is  a real-valued  number,  S,  which  Is  a function  of  (1)  the  sum  of  the 
squared  residuals  of  the  calibration  curve,  (11)  the  number  of  parameters 
to  be  estimated,  p (here  p-3),  (111)  the  number  of  data  pairs, 
n (here  n-24)  and  (1v)  the  confidence  level,  1-a  (here  a-0.05). 

Expanding  the  equation, 

S(a,  0,  y ) * l (Y,  - a - B/i)2 
1-1  1 


Since  A$2  - 283  + (C-S)  - 0,  

0 „ -(>2B)t/4B2  -4A(C-S) 

* *A 

- B ± A2  - A(C  S) 

where  A ■ V a2x^,  B ■ l (Yj  - a)  y*^  > and  C * £ (Y*  -a)2. 

1-1  1*1  1 1-1 

Since  for  this  example  It  Is  simpler  to  .calculate  the  3-values  from 
a and  y.  the  3- tuples  will  be  denoted  by  (ci.y.B)  to  conform  to  the 
usual  coordinate  convention  (x,y,z)  In  three-dimensional  drawings.  A 
visualization  of  the  parameter  surface  Is  achieved  by  use  of  the  fact 
that  every  point  (a,Y,e)  of  P which  lies  on  the  surface  or  Interior  to 
It  has  real-valued  conponents.  As  a was  held  constant  the  two  real  8 
values  were  calculated  for  successively  Incremented  y values;  the 
process  was  repeated  for  successive  a Increments.  To  obtain  sufficiently 
small  Initial  values  and  sufficiently  large  final  values  for  a and  y 
to  bracket  the  entire  surface  It  was  necessary  to  widen  the  98.3% 

Individual  confidence  Intervals  by  about  25%.  In  this  example,  the  98.3% 
confidence  Intervals  for  a and  y are  [90.031,  92.508]  and  [0.87647,  0.88382] 
respectively;  the  Intervals  [89.800,  92.909]  and  [0.87544,  0.88454]  are 
sufficiently  large  to  Include  the  a and  y values  which  apply  to  C. 

Increment  sizes  of  0.01  and  0.0001  for  a and  y.  respectively,  require 
approximately  3 minutes  central  processor  time  to  produce  approximately 
11,500  points  of  C.  Figure  2 Illustrates  the  tapping  procedures,  using 
a hypothetical  sphere  as  an  example. 


Figure  2.  Illustration  of  the  method  used  to  determine  the  points  on 
the  surface  of  a solid  figure. 
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A package  of  FORTRAN  callable  subroutines,  the  Perspective  Plotting 
System,  Is  available  at  CSL  for  producing  drawings  of  perspective  views 
of  three-dimensional  objects.  Appropriately  ordered  arrays  of  data  values 
are  necessary  as  input  to  this  graphics  package.  The  array  of  approximately 
11,500  e-values  for  the  region  C,  which  was  created  as  described  from 
the  (a,  -y)  pairs,  contains  values  between  -82.311  and  -79.620;  hence,  C 
lies  In  the  negative  0 half-space.  Attempts  to  achieve  graphical 
representations  of  C using  the  calculated  array  were  only  partially 
successful  because  of  limitations  of  the  Perspective  Plotting  System. 

For  example,  the  coordinates  for  an  acceptable  "observer's  position"  must 
be  selected,  and  plots  of  closed  solid  figures  are  not  now  possible. 

However,  a perspective  drawing  of  approximately  one-half  of  the  region 
was  produced  by  means  of  a two-step  transformation  on  the  coordinates. 

First,  subtraction  of  the  centroid  coordinates  from  each  point  (a,  y,  02) 
translated  the  upper  portion  of  C to  the  vicinity  of  the  origin  In  P. 

Second,  a transformation  matrix  applied  to^the  translated  points  rotated 
the  figure  In  such  a way  that  the  vector  [amax-o.  Ymax-Y»  3mirf  ^ 

Is  rotated  Into  the  Ar-plane.  The  net  effect  Is  approximately  a one-to- 
one  linear  mapping  of  the  points  (a,  y,  B2)  onto  points  (a',  y',  B ') * 
where  s'  > 0. 

A computer  graphics  drawing  of  this  object  was  produced  by  a 
Tektronix  4051  Graphic  System  using  approximately  2700  points.  The  final 
version  shown  In  Figure  3 was  produced  by  means  of  a CalComp  Pen  Plotter. 
The  true  scales  of  the  A and  B directions  have  a ratio  of  approximately 
1:1.  The  ratio  of  the  true  scale  of  A to  that  of  r Is  approximately 
100:1.  Despite  the  unavoidable  distortion  of  scale  In  the  drawing, 
Interesting  geometrical  characteristics  of  the  region  are  apparent.  The 
figure  appears  to  have  symmetries  with  respect  to  certain  axes  and  planes. 
Alternating  ridges  and  grooves  encircle  C In  the  r-directlon.  Analysis 
of  the  mathematical  properties  of  the  function  which  defines  C has  not 
been  completed. 


UPPER  PORTION  OF  THE  95*  CONFIDENCE  REGION  C IN  THE  PARAMETER  SPACE  P 


For  each  concentration  x,  the  calibration  curve  y(x)  * o+ByX  lies 
within  a 100(l-a)3  confidence  band  determined  by  the  100(l-a)3  confidence 
region  C In  P.  To  determine  the  curves  which  define  the  band,  it  Is 
necessary  to  calculate,  for  each  x,  the  maximum  and  minimum  values  of  y 
for  all  values  of  the  parameters  In  the  region.  It  Is  possible  to 
eliminate  from  consideration  all  points  (a,Y»0)  In  the  Interior  of  C 
for  the  following  reasons.  The  directional  derivatives  of  y In  the 
coordinate  direction  are 


°“y ' l, 

O.y  * y 

B y-1 

O^y  • exy 

For  all  x and  for  all  parameter  values  obtained  here  (note  y f 0) 
these  derivatives  are  defined.  At  no  point  (aQ,  y0.  Bq)  Is  It  true 
that  Day  - D0y  * ■ 0.  Since  It  Is  necessary  that  the  three  partial 

derivatives  equal  zero  simultaneously  for  an  extreme  value  of  the  function 
to  exist  at  a point.  It  follows  that  extreme  values  of  y on  the  closed 
region  must  occur  on  the  boundary  C. 

For  each  x from  0 to  15.1  ng/ml  (in  Increments  of  0.01)  the  value  of 
y was  computed  for  approximately  11,500  points  of  C.  The  maximum  and 
minimum  y values  for  each  x represent  the  100(l-a)X  confidence  limits 
for  the  instrument  response.  At  a 953  confidence  level  the  decision 
limit  for  this  curve  Is  10.74  divisions  and  the  detection  limit  Is 
0.08  ng/ml;  at  the  higher  concentration  levels' (about  14.5  ng/ml),  the 
Interval  represents  an  uncertainty  of  approximately  ± 0.3  ng/ml. 

The  table  gives  the  maximum  and  minimum  chart  readings  for  concentrations 
from  0 to  15  ny/ml.  Figure  4 1*  a graph  of  these  points  to  illustrate 
the  calibration  curve  confidence  band. 


TABLE 


C-'nc*ntrltfon,  * 
Ing/mt) 

ANALYZES  CHART 

n 

(dtviilons) 

READINGS 

Yjyx 

(Hvltlont) 

? 

(divisions) 

0 

9.95 

10.74 

10.35 

4 

19.76 

20.31 

20.05 

2 

26.33 

2B.B4 

28.58 

3 

35.80 

36.39 

36.10 

4 

42.39 

43.03 

42.71 

5 

48.20 

46.86 

46.53 

6 

53.32 

53.98 

53.65 

7 

57.85 

58.47 

66.16 

e 

61.84 

62.42 

62.13 

9 

65.35 

65.89 

65.62 

10 

68.43 

68.97 

68.70 

n 

71,12 

71.69 

71.40 

12 

73.47 

74.10 

73.78 

13 

75.51 

76.24 

75.86 

14 

77. 30 

76.15 

77  72 

IS 

78.57 

79.3) 

79.36 
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C . Future  Investigations 

The  work  described  here  has  Involved  fitting  the  model 
y « o + 6yx  by  least  squares  and  the  development  of  a numerical  method 
of  determining  a lOO(l-a)*  confidence  bend  for  the  curve.  Continuing 
Investigations  will  Include  (1)  extension,  if  possible,  of  the  methods 
to  additional  nonlinear  models  which  are  of  Importance  In  testing 
and  other  experimental  work,  (11)  analytic  Investigation  of  the 
functions,  and  (111)  development  of  a computer  program  to  permit  a 
more  complete  visualization  of  closed  surfaces  of  the  type  encountered 
In  this  study. 
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COMPUTING  THE  DEFINITE  INTEGRAL 


’ + V + r)ta 


ON  A PROGRAMMABLE  CALCULATOR 
Donald  V.  Rankin 

Army  Material  Teat  and  Evaluation  Directorate 
US  Army  White  Sands  Missile  Range 
White  Sands  Missile  Range , New  Mexico 

ABSTRACT.  When  a reliability  function  is  expressed  by  the  exponential 
of  a quadratic  form,  computation  of  mean  life  or  mean  time  to  failure 
requirea  evaluation  of  the  definite  Integral 


6 - / a' 


-(px2  + qx  + r). 


A transformation  of  variables  la  effected  by  completing  the  square.  This 
allows  6 to  be  expressed  rather  simply  In  terms  of  the  complementary  error 
function  of  the  new  variable.  The  latter  can  be  evaluated  by  either  of 
two  well-known  Infinite  aeries. 

In  using  these  series  and.  Indeed,  in  selecting  which  of  the  two 
should  be  employed  In  a given  case,  certain  difficulties  are  met  with 
and  there  are  some  pitfalls  to  be  avoided.  A reasonably  economical 
solution  to  the  problems  encountered  Is  found. 

I . THE  PROBLEM.  Recently,  in  conducting  a software  reliability 
analysis,  employment  of  the  modified  Schick-Wolverton  model  was  Indicated 
£s]].  This  gives  rise  to  the  following  equation: 


MTTF  - 0 - / a_*ac2x  + I x^dx, 


a and  c2  being  constants  obtained  by  observation.  Solution  is  by  "com- 
pleting the  square".  Thus 


0 - / 


“C2  -a2  + a2  + ax  + 


+f]!  ix. 


2 2 * 
a*c*  t 
1 I e 


Let  t - cm  4-  Y X.  Then  dt  ■ j dx.  Note  that  when  x ■ 0,  t • ca,  whence 
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2 «2-2  tT°  _t-2 

6 * f • / • fc  dt.  (2) 

t“CS 

In  passing,  observe  that  the  absence  of  a constant  term  In  the  first 
exponent  entails  no  loss  of  generality. 

The  error  function  and  its  complement  are  defined! 

2 5 -t2 

erf  a f e dt  and 

/To 

9 m 2 

erfc  a ■ / e“c  dt  ■ 1 - erf  s. 

/T  t 

Thus,  setting  a - ca, 

/T  B2 

8 . JLL  eB  arfc  a.  (3) 

c 

II.  THE  ASYMPTOTIC  SERIES  FOR  erfc  a.  Por  large  values  o{  i,  a use- 
ful asymptotic  expansion  is 


/T  e*2erfc  a * — 1 - i a"2  + *“4  - 

1 L 1 22  2> 


L g“6  + ... 


The  general  tern  is  T 


• • a.-  n 

2n  a2u  + 1 


the  recurrence  ratio  T 


g2  n - 1 


It  is  easy  to  see  that  the  smallest  tern  will  occur  when  0<*i-n  + i2^l, 
the  series  diverging  after  that  point.  Using  this  inequality  to  identify 
the  smallest  tern,  end  truncating  the  series  immediately  thereafter, 
results  in  a (nsarly)  minimum  error.  The  worst  case  occurs  when  a2  ■ n - %, 
n being  the  Integer  eubscrlpt  of  the  smallest  term.  Some  values  of  the 
relative  error  in  this  sum,  together  with  the  relative  value  of  the  small- 
est term,  are  tabulated  for  illustration: 
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TABLE  1 


*2 

M 

|t  | 

' n1 

<2 

l‘| 

lTJ 

C0 

c6 

C0 

C0 

0.5 

1.0000 

1.5251 

7.5 

4.3261  E-4 

8.3823  E-4 

1.5 

0.23446 

0.41149 

8.5 

1.5741  E-4 

3.0609  E-4 

2.5 

0.075564 

0.13867 

9.5 

5.7399  E-5 

1.1193  E-4 

3.5 

2.6047  E-2 

4.8859  E-2 

10.5 

2.0964  E-5 

4.0976  E-5 

4.5 

9.2158  E-3 

1.7514  E-2 

11.5 

7.6657  E-6 

1.5012  E-5 

5.5 

3.3030  E-3 

6.3317  E-3 

12.5 

2.8056  E-6 

5.5034  E-6 

6.5 

1.1926  E-3 

2.3002  E-3 

13.5 

1.0276  E-6 

2.0185  B-6 

III.  A SERIES  FOR  erf  z.  For  small  values  of  r,  the  Infinite  series 


erf  s ■ 


2n  + 1 

B 

• • • • (2n  + 1) 


(3) 


la  employed  P4]].  Although  the  series  converges  for  all  finite  values  of 
x.  It  Is  of  little  practical  use  whan  x is  large.  Convergence  is  then 
very  slow  — hundreds,  even  thousands  of  terms  being  required  — and  an 
unacceptably  high  number  of  significant  digits  era  lost  when  the  subtrac- 
tion erfc  i*l-  arf  * is  performed,  even  though  the  computations  be 
done  In  multiple  precision. 

Recalling  that  *r(x)  • r(x  + 1)  and  that  r(*j)  ■ /rT  , we  have 


r|m  + •jj  • J * §■  * f-  ' * ' * J r ||,  where  m is  an  Integer.  This 


can  be  rewritten 


r[»  + |]  . 1 -3  o - 1)AT 


Setting  m ■ n + 1,  we  can  write  immediately 


erf  s ■ e 


• _2n  + 1 


l l 


n*o  r|n  + j 


(6) 
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The  wanted  function,  of  couraa,  ia 


q «*, . . .■ 

c c 


(1  - arf  a) 


£L 

c 


/r 

c 


C*  r I2n  + 1 

• -In 


n«o  r 
_2n 


n + 4 

* 


2n  + 1 


' E TfoTiy  - E n — 5T  ’ 

n-o  'n  ' n-o  r n + -f 

*}  . 


(7) 


Thla  laat  fora  not  only  points  up  a problem  — that  a1  oust  ba  conputad 

_2n  + 1 

to  tha  a umm  precision  aa  £ 


n r 


j*|-  — but  auggaata  tha  anarar: 


Tha  two  parta  can  ba  summed  using  tha  aaaa  subroutine.  varying  only 
tha  Tiret  tarn  and  tha  first  valua  of  tha  auanlng  Index.  Thla  advantage 
(programming  simplicity)  was  dacialv*  In  tha  choice  of  aarlaa  for  arf  s, 
avan  though  ona  la  known  which  converges  slightly  faster  £3] . 

It  la  Interesting  to  note  that  a simple  change  of  auanlng  Index  pro- 
duces tha  elegant  form 


(8) 


An  aatlmcta  of  tha  numbar  of  significant  dlglta  lost  by  subtraction 
la  given  by 


108io  71 

Soma  values  of  - 


<1  - arf  *) 
log^Q  arfc  a are 


log^Q  arfc  e. 
tabulated  t 
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TABLE  2 


s 

-log^arfc  £ 

D 

-log10erfc 

1.5 

1.470 

2.4 

3.162 

1.6 

1.626 

2.5 

3.390 

1.7 

1.790 

2.6 

3.627 

1.8 

1.962 

2.7 

3.872 

1.9 

2.142 

2.8 

4.125 

2.0 

2.330 

2.9 

4.386 

2.1 

2.526 

3.0 

4.656 

2.2 

2.730 

3.1 

4.934 

2.3 

2.942 

3.2 

5.220 

Q 

-lcg1Qerfc  s 

3,3 

5.515 

3.4 

5.818 

3.5 

6.129 

3.6 

6.449 

3.7 

6.777 

3.8 

7.113 

3.9 

7.459 

4.0 

7.812 

4.1 

8.174 

Table  2 does  not  take  Into  account  the  affect  of  round-off  error  in  the 
individual  tanu. 


It  can  be  aean  at  once  that,  ae  a increases,  significant  diglta  are 
lost  at  an  accelerating  rata.  An  actual  single-precision  program  on  a 
13-digit  calculator  produced  the  following  result: 

TABLE  3 


argument  range 
(value  of  a) 

significant 

digits 

number  of 
terms  in  sum 

0.83 

to 

1.42 

10 

13 

to 

20 

1.43 

to 

2.01 

9 

19 

to 

26 

2.02 

to 

2.51 

8 

25 

to 

33 

2.52 

to 

2.93 

7 

31 

to 

38 

2.94 

to 

3.30 

6 

37 

to 

43 

3.31 

to 

3.63 

5 

42 

to 

48 

3.64 

to 

3.94 

4 

46 

to 

53 

3.95 

to 

4.22 

3 

51 

to 

57 

4.23 

to 

4.48 

2 

55 

to 

61 

4.49 

to 

4.73 

1 

59 

to 

65 

4.74 

to 

m 

noise  only 

63 

or 

more 

lat  u«  assume  there  la  a requirement  Co  compute  to  six  significant 
digit*  on  a machine  which  coaputaa  ax  with  a maximum  relative  error  of 
NT*.  For  values  of  the  argument  up  to  about  2.33  (a2  111  5.43) , the 
second  series  (see  eq.  7)  can  be  used,  and  for  values  above  3.68 
(s2  - 13.54),  the  asymptotic  series  (see  eq.  4)  can  be  used  it  truncated 
after  the  smallest  term.  But  what  is  to  be  done  when  the  argument  falls 
"in-between"?  (i.e.,  when  2.33  < s < 3.68?) 

The  answer,  eurprisingly  enough,  lies  in  the  asymptotic  series 
itself.  Asymptotic  series  of  this  type*  have  a moat  interesting  and 
useful  property:  Provided  that  the  truncated  aeries  consists  of  at 
least  two  terms  (l.e.,  n >.  1),  and  further  provided  that  the  series  Is 
terminated  Immediately  after  the  smallest  term,  the  approximation  ALWAYS 
Is  Improved  by  halving  the  last  term.  Performing  this  operation  and 
tabulating  (see  Table  4),  it  is  seen  that  the  Improvement,  though  quite 
noticeable,  Is  not  yet  enough  to  solve  the  problem. 

TABLE  4 


bib 


0.5  0.23743  5.5  1.3721  E-4  10.5  4.7599  E-7 

1.5  0.028718  6.5  4.2497  E-5  11.5  1.5957  E-7 

2.5  6.2314  E-3  7.5  1.3495  E-5  12.5  5.3917  E-8 

3.5  1.6177  E-3  8.5  4.3663  E-6  13.5  1.8340  E-8 

4.5  4.5884  E-4  9.5  1.4333  E-6  14.5  6.2743  E-9 

It  is  both  interesting  and  Informative  to  compute  and  plot  the  ratio 

. (See  Figure  1.)  Since  an  alternating  series  always  "overshoots", 
n 

the  last  term  used  and  ths  error  in  the  partial  sum  will  be  of  the  same 
sign,  and  their  ratio  will  be  positive  definite.  The  function 

f(t ) - Y~ 

A 

*1.*.,  with  simple  terms.  Should  — say  — Bernouiill's  numbers  appear, 
the  adjective  "useful"  may  no  longer  be  applicable,  due  to  increased 
prograamlng  difficulties. 


1 


I 


I ' 


I 


I . 

I 

i 

( 

i 

i 


i 


[ 


» 

) 


i 


l 


I 


i 

i 


Is  a "saw-tooth",  having  two  values  at  those  points  where  i?  + is  an 
Integer.  (There  are  two  equal  "smallest"  terms,  and  it  Is  arbitrary 
whether  one  or  both  are  used.)  It  la  obvious  that  the  sun  of  the  two 
values  la  unity. 

After  applying  the  half-term  correction,  the  remaining  error  can  be 
stated  as  e - Tn,  of  course.  Using  a similarly-formed  ratio,  we  define 

e - Jj  T 

g(e)  j f(«)  - *S.  (9) 

n 


Let  us  tabulate,  not  g(z),  but  Its  reciprocal,  at  the  points  where 
^2  + ■ n (l.e.,  an  Integer),  using  the  greater  of  the  two  values 


TABLE  5 


z2 

n 

nm 

i2 

n 

s 

g(«)  c-Un 

0.5 

6.4234  5156 

7.5 

8 

62.1131  7520 

1.5 

14.3283  6175 

8.5 

9 

70.1017  6038 

2.5 

22.2527  7233 

9.5 

mm 

78.0924  2300 

3.5 

30.2036  5992 

10.5 

D 

86.0846  4634 

4.5 

5 

38.1700'  2696 

11.5 

1 

94.0780  7107 

5.5 

6 

46.1457  3873 

12.5 

102.0724  3985 

6.5 

7 

54.1274  3674 

13.5 

110.0675  6367 

By  inspection,  we  can  approximate  yffi'  at  these  end-points  reasonably 
well  by  the  function 

- On  - 2 + from  which 


g*(0 


n + 1 f 
8n2  + 6n  - 1 


The  right-hand  end  of  the  ramp  is  then  estimated  by 


-8* 


-fn  + 2) 

8n2  + 22n  + 13 


I 
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V.  APPROXIMATING  THE  RAMP  FUNCTION.  We  can  improve  both  the 
notation  and  the  accuracy  as  follows. Let 

C-*2  + »i-n  + X.  (10) 


The  Integer  part  of  f,  is  represented  by  n,  the  decimal  part  by  x. 

C is  a continuous  variable,  n a discrete  one.  The  general  form  of  the 
approximating  function  is  taken  to  be 

g*TTF  ■ 85  “ 2 + TTa 

which  upon  development  yields 


g*(U 


5 4-  q 

8C2  + (Ba  - 2)  £ + (1  - 2a) 


(ID 


A little  investigation  reveals  that  in  the  region  of 
near-optimum  formula  is  given  by  assigning  the  value 


8*(0 


St2  + 55  - | 


interest  (z  > 2),  a 
a « ■g-.  Thus 

(12) 


As  a fortunate  happenstance,  the  denominator  is  factorable,  allowing  the 
expression  to  be  reduced  to  psrtisl  fractions, 


g*(£)  - j JfiTTT  ” 85  + 6j 


(13) 


An  extremely  close  approximation  to  the  ramp  is  given  by 

[l  - 2x  + ] g*(0» 

n 

Adopting  the  notation  £ T.  for  the  finite  serios  truncated  after  the 

i“0 

smallest  term,  we  find 

cO  - J Tt  - Tn  [|  + [l  - 2x  + S—JSi  J g*(0  J (14) 

Some  worst-case  results  ere  given  in  Teble  6,  below. 
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TABLE  6 

Residual  error i n,  in  c8  from 
"corrected"  asymptotic  series  for  erfc  z 


s 

n 

n 

Tn 

2.00 

-1.253331  E-7 

-9.7940  E-6 

2.07 

2.103996  E-7 

22.3722  E-6 

2.17 

1.429149  E-7 

-24.3138  E-6 

2.30 

-0.407994  E-7 

13.1637  E-6 

2.39 

-0.279393  E-7 

-14.2803  E-6 

2.30 

8.7352  E-9 

8.0323  E-6 

2.39 

6.0404  E-9 

-9.0374  E-6 

2.70 

-2.0204  E-9 

3.6537  E-6 

2.78 

-1.3582  E-9 

-6.0688  E-6 

2.88 

0.4902  E-9 

3.9942  E-6 

2.95 

3.409  B-10 

-4.278  E-6 

3.04 

-1.248  B-10 

2.772  E-6 

3.12 

-0.833  E-10 

-3.103  E-6 

3.20 

0.328  B-10 

2.081  E-6 

3.27 

0.229  E-10 

-2.331  E-6 

3.36 

-9  E-12 

1.69  E-6 

3.42 

-6  E-12 

-1.80  E-6 

3.50 

3 E-12 

1.35  E-6 

3.57 

2 E-12 

-1.47  E-6 

3.64 

-1  E-12 

0 . 88  E-6 

It  is  found  that  employment  of  the  "corrective"  term  extends  the  use 
of  the  aeymptotic  series  down  to  an  argument  of  s » 1.99,  thereby  over- 
lapping the  useful  range  of  the  other  series  and  providing  a solution  to 


1 
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1 

) 

i 


! the  six-place  problem  poeed  In  Section  IV.  In  fact,  if  "break  points" 

\ of  z - 2.1  and  s - 4.1  ere  chosen*,  the  relative  error  throughout  the 

whole  spectrum  probably  does  not  exceed  3.5  x 10~7.  A program  written 
for  a thirteen-digit  calculator,  with  break  points  at  z 11  2,5  and 
a - 4.4  (sunning  the  first  eleven  terms  thereafter),  produces  a value 
| of  c8  which  errs  no  more  than  one  in  the  eighth  decimal  place. 

( VI.  INCREASING^ THE  ACCURACY.  In  the  remote  event  that  additional 

accuracy  is  required,  two  avenues  of  approach  offer  themselves.  3 

! j 

i a.  The  calculations  can  be  performed  in  double-  (or  triple-)  j 

precision.  This  will  extend  the  useful  range  of  the  argument  when  1 

employing  the  series  for  arf  s.  This  procedure  la  NOT  recommended,  since 
it  will  increase  the  running  time  by  many  orders  of  magnitude. 

b.  The  accuracy  of  the  "corrective"  term  can  be  improved,  thereby 
extending  downward  still  further  the  use  of  the  asymptotic  series  for 
arfc  z,  Since  we  will  be  operating  in  a region  where  the  asymptotic 
aeries  contains  very  few  terms  anyway,  it  is  unlikely  that  running  time 
will  be  too  adversely  affected,  In  pursuit  of  our  goal,  two  steps  are 
taken, 

1,  The  degree  of  the  rational  expression  for  g*(0  is 
increased.  It  is  found  to  be 

| 

g*(0  - — - P — (i5)  j 

8£3  + (8a  - 2)  52  + (1  - 2a  + 8S)  ? + a - 26  - V- 


Selecting  a ■ 1.2  and  6 ■ 1.05  results  in 

g*(0 05, ... 

8c3  + ;.65z  + 7£  - 1.65 

2.  More  terms  are  added  to  the  ramp  function.  Thus 

i 


(16) 


t 

k i 

l- 

i 

i 

i 

L 


i 


r 


•When  s > 4.1',  merely  sum  the  first  eleven  terms  (i  ■ 0,  1,  2,  ••*,10) 
of  the  asymptotic  series. 
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m3  paqe  is  ski  9UAIIIV  rascncw* 

mon  oon.  fubjush©  to  ddq 


c9  - y T.  - T 
i-o  1 n 


H1- 


2x  + 


X - X2 


- q ~ 2x)  (x  - x2)  (x  - x2)2] 

. A i ' 


4$2 


253 


g*<0 


(17) 


Using  these  refinements,  with  break  points  at  2.34  and  4.77,  reduces 
the  maximum  error  on  a 13-digit  calculator  to  less  than  1.7  x E“9. 
Attempts  to  further  reduce  the  maximum  error  will  prove  to  be  tedious 
and  somewhat  unrewarding,  since  the  "smallest"  term  in  the  asymptotic 
series  becomes  too  large  to  lend  itself  to  the  process. 
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A FRESHMAN  ERROR  CAN  BE  FATAL 
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I *M  NOT  SO  SURE  ABOUT  BEING  95  PERCENT  SURE 


NORMAN  L.  WYKOFF 

US  ARMY  JEFFERSON  PROVING  GROUND 
MADISON,  IN  47250 


ABSTRACT. 

The  testing  of  artillery  ammunition  involves  the  use  of  control  rounds  to 
measure  the  "day-to-day"  variations  caused  by  different  tubes,  recoils  and 
weather  conditions.  The  control  rounds  are  assembled  from  components  that 
have  been  tested  (separately  and  In  combination)  In  sufficient  quantity  to 
establish  the  performance  characteristics  of  control  components  and  com- 
plete rounds. 

The  difficulty  comas  when  a component  Is  nearly  depleted  and  must  be  re- 
placed. Unless  the  match  Is  perfect,  the  performance  characteristics  of 
the  control  will  shift.  The  accepted  technique  thus  far  has  been  to  check 
the  match  or  mismatch  using  a 95  percent  confidence  Interval  for  the  means 
of  rounds  with  the  old  (n»20)  and  new  (n»20)  component.  Obviously,  this 
criteria  does  little  to  assure  the  Integrity  of  the  control  and  thus  can 
Jeopardize  troops  In  the  field. 

The  problem  Is  two-fold:  (I)  what  is  an  optimal  technique,  considering 
both  cost  and  control  Integrity;  and  (2)  how  can  we  eliminate  the  Idea 
that  use  of  a 95  percent  confidence  Interval  means  you  are  almost  certain 
to  make  a good  choice. 


I.  INTRODUCTION: 


Part  of  the  mission  of  US  Army  Jefferson  Proving  Ground  Is  to  ball Istlcally 
test  large  caliber  ammunition.  Statistically,  the  process  Is  not  overly 
complicated,  but  there  are  many  factors  that  vary,  Independently  and  depend- 
ably, that  keep  the  process  from  being  a simple  one. 

A round  of  ammunition  Is  a complex  machine.  There  are  many  components 
that  must  do  their  particular  Job  In  exactly  the  right  way,  at  exactly  the 
right  time  for  the  complete  round  to  behave  properly. 


In  this  example  there  are  eight  major  components  plus  the  whole  assemblage 
to  be  bal 1 Istlcal ly  tested.  That  Is,  performance  parameters  such  as  veloc- 
ity, chamber  pressure,  target  accuracy,  range  accuracy  and/or  functioning 
must  be  evaluated  for  each  component  when  the  round  Is  fired  from  the  appro- 
priate weapon. 

There  are  many  different  factors  that  can  affect  a parameter  such  as  the 
velocity  of  the  round.  For  example,  tube  wear,  recoil  system,  give  of  the 
earth  under  the  weapon,  size  of  the  projectile,  type  of  rotating  band, 
burning  rate  of  the  propellant,  and  of  course  the  amount  of  propellant 
will  each  have  an  effect.  Some  of  these  factors  cannot  readily  be  measured 
and  mayy  In  fact  change  from  trial  to  trial.  The  obvious  way  to  estimate 
the  total  of  all  these  extraneous  effects  Is  to  use  a control  round.  A 
control  round  with  a long  history  of  performance  Including  many  very  care- 
fully monitored  firings  can  be  used  to  estimate  the  trlal-to-trlal  or  day- 
to-day  variation  as  It  Is  usually  called,  in  brief,  If  the  control  rounds 
have  a mean  velocity  that  Is  20  foot-seconds  lower  than  normal  In  a trial, 
we  assume  that  the  sum  total  of  all  those  effects  yields  a 20  foot-second 
decrease  In  the  velocity  of  the  test  rounds  also.  Therefore  we  add  20  foot- 
seconds  to  the  observed  test  velocities  to  "correct"  them  to  standard  con- 
ditions. 
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In  order  to  reduce  the  number  of  Interactions,  a test  component  Is  tested  j 

against  the  control  component  by  loading  each  Into  rounds  that  are  "Ident- 
ical" except  for  the  component  being  tested.  In  this  way  we  can  measure  j 

the  change  In  performance  of  the  test  component  from  the  control  component. 

By  now  you  can  see  the  dependence  on  the  performance  Integrity  of  the  con- 
trol round  for  a critical  parameter  such  as  velocity.  It  Is  exactly  this  ; 

dependence  that  creates  my  concern  In  this  present  problem.  Before  I de-  i 

scribe  the  problem  more  fully,  let  me  emphasize  that  obtaining  the  long  ' 

history  on  the  control  rounds  Is  expensive  In  time,  money  and  material.  ' 

i 

II.  The  Problem:  Because  of  the  variety  of  uses  of  the  control  round,  one  j 

compbHB'hV  WBV'Te  nearly  depleted  long  before  the  others.  It  only  makes  j 

sense  then,  because  of  economics  to  substitute  a new  lot  of  the  component,  i 

rather  than  restart  the  whole  process. 

Suppose  the  component  In  question  Is  the  projectile,  It  obviously  has  an  ; 

effect  on  the  velocity.  Incidentally,  we  will  not  consider  the ■ propel  1 ant 
since  the  substitution  process  Is  different  for  the  propellant.  The  ques-  j 

tlon  now  Is,  what  Is  the  best  procedure  to  use  In  substituting  a new  pro-  ; 

jectlle  lot?  I 

4 

• 1 

Figure  2 shows  the  description  of  the  accepted  practice. 


FIGURE  2 

B*  FIRINC  WITH  SUnsrtTUTr.  COMPONENTS.  The  purpose  of  those  firings 
is  to  datormino  tho  offoct  oC  Chnnging  a selected  component  in  the  master 
or  reference  established  values.  When  any  change  of  a component  in  the 
reference  round  la  required,  tho  following  steps  are  token: 

a.  From  engineering  judgment  end  pnst  data  decide  whether  tha  chango 
la  likely  to  affect  tha  velocity  or  pressure  level  of  the  round. 

b.  If  e change  in  velocity  or  pressure  is  expected,  fire  20  rounds 
from  the  check  tube  with  the  old  component  and  20  with  tha  new,  keeping 
all  other  component!  the  earns. 

e.  If  the  firing  in  b above  la  not  statistically  different  (signifi- 
cance level  of  32),  accept  the  now  component. 

d.  If  tho  firing  in  b above  shows  a significant  dlfforenco,  flra 
20  additional  rounds  with  tho  new  component  and  20  with  tho  old  In  each 
of  two  tuboa  with  not  leas  than  90  percent  life  remaining  (total:  80 
rounds).  If  this  firing  also  shows  a significant  difference,  discard 
tho  now  component  and  select  a second  replacement  component.  Repost 
tho  tost  procedure  in  b until  a satisfactory  roplocsment  component  is 
obtained, 

a.  For  multlcharge  systems,  conduct  the  firings  under  b at  xonts  at 
which  ballistic  dlfforoncos  would  bs  at  a maximum.  If  the  dlfforenco  la 
significant  at  that  charge  or  charges,  follow  the  procedure  of  d,  above. 

f.  Before  testing  a substitute  lot,  evaluate  the  performance  of 
the  existing  calibration  rounds  (para  3.3}  and  submit  the  evaluation  to 
ARMCOM. 


It  leaves  quite  a bit  to  the  Imagination  of  the  reader  doesn't  It.  Although, 
perhaps  not  too  much.  The  underlying  assumption  In  the  process  Is  that  the 
continuous  parameters  (velocity  In  this  case)  have  a normal  distribution  with 
p and  a unknown  and  estimable  for  a given  trial  only  by  the  results  of  that 
trial.  There  are  a few  possible  Interpretations  for  the  meaning  of  the 
statement  above  but  the  one  that  seems  to  have  been  used  by  those  who  have 
the  task  of  Interpreting  It  is  to  use  the  2 sample  t-test  (2  sided).  That  ' 
Is,  the  test  Is  based  on: 
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This  Is  exactly  what  you  saw  In  that  freshman  statistics  course  a few  years 
ago.  However,  hopefully  you  saw  more.  You  understood  that  the  so  called 
35  percent  confidence  interval  given  in  (3)  Is  an  Interval  big  enough  to 
contain  the  difference  of  the  sample  means  (given  these  values  of  Sj  and 
Sa)  95  percent  of  the  time  If  the  two  samples  actually  come  from  the  same 
population  and  that  you  didn't  fall  prey  to  the  freshman  fallacy  of  be- 
lieving that  If  Xj  - Xj  fell  In  this  Interval  you  were  95  percent  sure 
that  Pi  and  p4  were  actually  the  same.  If  you  made  this  mistake  you  prob- 
ably never  did  reconcile  the  Implication  that  the  larger  99  percent  Interval 
made  you  even  more  certain  that  the  match  was  good.  Of  course,  we  don’t 
make  such  errors.  Perhaps  If  the  phrase  "confidence  Interval"  wasn't  used 
others  wouldn't  either.  I wish  we  could  change  this  to  a "95  percent  loca- 
tion Interval". 
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The  problem  ts  hopefully  now  clear.  The  process  Is  good  for  the  seller, 
but  not  for  the  buyer  and  I represent  the  buyer.  To  say  It  a different 
way,  this  Is  the  classic  case  where  a (the  probability  of  rejecting  a 
test  lot  that  ls()an  exact  match)  is  fixed  and  6 (the  probability  that 
a'poorly  matched  lot  ts  accepted)  varies  and  for  some  reason  that  1 pre- 
fer not  to  put  In  print,  we  choose  a to  be  small. 

In  the  following  example,  the  numbers  are  realistic  although  they  do  not 
represent  actual  data.  Suppose  Si , Si,  ni,  and  7a,  S2,  n2  represent  the 
old  and  new  sample  means,  standard  deviations,  and  numbers.  Suppose  fur- 
ther than  nj  * n2  - 20,  7i  » 5050,  Sj  ■ S?.  * 26.6  (the  maximum  allowable 
value  for  acceptance  tests  for  this  round)  and  7j  ■ 5067.  The  acceptance 
region  Is  shown  In  Figure  4 below, 

By  now  some  are  asking,  why  not  use  the  location  Interval  based  on  the 
first  sample  and  see  If  7i  falls  Inside? 


Fiausc  4 

55*  Location  Interval  Baaed  on  2 Sample  Teat 

f 1 *1 { 

*I7*°3  0 17*03  d 

5032.97  3050  50(7-0)  *2 

55*  Location  Interval  Baaed  on  Sample  3 

| f ) 

0 12.45  d 

50)7.55  5050  50(2.45  x, 

951  Location  Interval  Baaed  on  Semple  2 

I 1 ) 

“•M*  0 12.45  d 

5054.55  50(7  5079.45  x, 


1 

1 


) 

i 

It  seems  to  be  a good  Idea,  It  does  shorten  the  Interval  with  no  Increase 
In  sample  size  or  cost.  1 believe,  however,  that  this  has  not  been  used 
because  It  doesn't  make  use  of  S2  (or  , If  you  center  the  Interval  about 
Xj)  and  It  Is  necessary  to  have  samples  taken  from  each  of  the  populations 
and  therefore  both  means  and  standard  deviations  are  available. 


l I 

■ i Looking  at  this  another  way  for  the  numbers  already  given. 


Quite  a change  In  6 from  the  two  sample  to  the  one  sample  technique. 
Better,  but  not  good  enough  for  me.  1 still  represent  the  buyer  end  1 
am  very  Insistent  that  the  product  "prove"  to  me  that  It  Is  good.  In 
other  words,  I want  to  have  a high  degree  of  confidence  that  the  product 
Is  good,  the  seme  thing  that  freshman  student  thinks  he  has.  I am  not 
content  to  reject  or  not  reject  the  claim  that  they  are  the  same.  You 
see,  1 must  think  of  that  poor  Gl  who  must  use  this  ammunition  to  defend 
j himself.  If  we  are  lax  and  let  the  velocity  level  change  too  much,  he 

jf  Just  might  miss  the  tank  that  Is  bearing  down  on  him. 


.1*  . •.  r 


There  seem  to  be  two  things  for  me  to  do.  first,  to  convince  the  people 
Involved  that  the  one  sample  technique  Is  preferrable  in  terms  of  the 
power  of  the  test,  even  though  It  doesn't  use  ail  the  sample  Information, 
end  second,  to  take  the  best  steps  to  decrease  0 even  more.  The  two 
alternatives  for  decreasing  0 are  to  Increase  the  sample  size  or  Increase 
o.  Both  techniques  dramatically  Increase  the  cost  In  my  application,  but 
It  Is  difficult  to  predict  the  exact  amount.  My  choice  is  to  Increase  a. 
This  will  result  In  rejecting  more  lots  of  good  components,  but  I will  be 
more  content  to  accept  a lot  that  passes  the  more  severe  test.  Increasing 
the  sample  size  will  Increase  the  confidence  In  the  decision  but  at  a 
greater  cost  in  each  lot  considered. 
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In  Figure  6 we  have  a comparison  of  the  distribution  for  a « ,50,  n » 20 
and  a ■ .05  with  n ■ 1*0  for  JTj  - 5050  and  JTi"  5067*  I like  the  tight  cut- 
offs on  the  first  one  and  the  separation  on  the  second.  However,  Figure  6 
only  tells  part  of  the  story.  The  question  I originally  thought  I would 
post  for  the  panel  Is;  Which  Is  better,  Increase  a or  n? 


But  after  I drew  the  0C  curves  for  n - 20  with  a - .50,  n ■ 40  with  a ■ 
.05  end  the  currently  used  2 sample  test,  I have  changed  the  question  to: 
How  much  should  I Increase  a7 
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ABSTRACT.  A laser  velocimeter  has  been  used  to  survey  turbulent, 
unsteady  flows.  Data  have  been  analyzed  in  histogram  form.  The  time- 
averaged  flow  field  has  been  found  from  ensemble  averages.  By  assuming 
stationary  flow,  the  standard  deviation  and  excess  give  the  RMS  unsteadiness 
in  the  flow  and  the  statistical  uncertainty  in  the  mean  and  standard  deviation. 

The  calculation  of  part  of  the  unsteady  flow  field  has  been  attemptod 
by  a Monte  Carlo  method.  Partial  success  in  explaining  bimodal  and  skewed 
histograms  has  been  achieved.  This  approach  has  been  limited  by  the  necessity 
of  constructing  a hypothetical  flow  field  and  the  inability  to  define  a 
mathematically  unique  solution. 

The  definition  of  power  spectra  has  been  achieved  for  single  components 
of  velocity.  Autocorrelation  has  been  chosen  to  construct  the  power  spectrum 
because  of  the  random  sample  time.  Measurements  of  velocity  are  available 
only  when  seed  particles  pass  through  the  sample  volume.  This  is  a random 
event  with  a Poisson  distribution  so  that  the  usual  time  series  analyses  are 
precluded. 

Theory  has  been  developed  far  cross-correlation  and  cross-spectral 
analyses  for  two  velocity  components.  However,  methods  for  analysis  of 
nonatationary  flow  have  not  yet  been  explored. 

I.  INTRODUCTION.  The  reduction  and  interpretation  of  data  acquired  by 
laser  velouimatry  in  large  wind  tunnels  has  illustrated  several  unique  aspects 
of  the  data  analysis.  The  distinctive  characteristics  of  the  laser  velocimeter 
that  contribute  to  the  need  for  new  data  interpretation  techniques  are  pri- 
marily the  ability  to  calculate  errors  prior  to  the  test,  the  acquisition  of 
discrete,  digital  measurements,  and  the  randomness  of  the  time  between 
measurements.  The  purpose  of  this  paper  is  to  illustrate  several  techniques 
that  have  been  developed  specif ically  for  handling  laser  velocimeter 
measurements,  to  outline  the  limitations  of  the  present  techniques,  and  to 
anticipate  opportunities  and  problems  that  lie  in  the  immediate  future. 

In  order  to  define  the  source  of  the  unique  aspects  of  laser  vsloclmetry, 
the  apparatus  is  briefly  deecrlbed.  This  description  is  sufficient  to  explain 
the  interaction  between  the  error  analysis  and  the  histogram  moments.  Monte 
Carlo  methods  extend  the  usefulness  of  the  histogram  as  an  interpretative  tool. 
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The  second  part  of  the  paper  deals  with  the  analysis  of  the  time 
dependence  of  the  flow.  The  capabilities  of  time  analysis  are  linked  both  to 
the  manner  in  which  laser  velocimeter  measurement  times  relate  to  the  time 
scales  of  the  flow  and  to  the  method  of  analysis  of  the  data.  The  most 
general  method  In  uae,  power  spectra,  Is  described  in  detail.  Differences 
between  laser  velocimeter  and  traditional  frequency  analyses  are  Identified. 
The  basic  requirements  of  conditional  sampling  are  outlined,  and  several 
future  needa  are  Identified. 


II.  APPARATUS 

Example  tests:  The  laser  velocimeter  has  been  used  in  large  wind 
tunnels  at  Langley  Research  Center  to  measure  flow  velocities  about  aero- 
dynamic models  such  as  wings.  Two  such  test  setups  are  shown  In  figure  1 
(Ref.  1)  and  figure  2.  These  particular  models  are  wings  at  very  high  angles 
of  attack  (about  19. 5°).  The  two  tests  used  flow  velocities  of  170  m/sec 
and  50  m/sec,  respectively.  In  both  cases,  measurements  were  taken  of  the 
two  components  of  velocity  which  lie  in  a plane  perpendicular  to  the  wing  span. 
This  plane  cut  the  center  of  the  span  of  the  wing.  Thus,  from  figures  1 and  2, 
it  can  be  seen  that  the  velocity  measurements  were  made  perpendicular  to  the 
laser  beams. 


Laser  velocimeter  operation;  In  order  to  measure  two  components  of 
velocity,  three  separate  laser  beams  were  used  (Ref.  1).  These  beams  inter- 
sected at  the  center  span  of  the  wing.  The  beams  were  0.3  mm  in  diameter  so 
the  volume  of  intersection  (called  the  sample  volume)  was  about  0.3  mm  in 
diameter  and  1 cm  In  length.  Seed  particles  that  pass  through  this  sample 
volume  scatter  laser  light  back  through  the  optics  system  to  photomultiplier 
tubes.  The  two  photomultiplier-tube  outputs  are  separately  checked  for 
consistency  and  strength.  A signal  of  sufficient  quality  will  allow  the 
measurement  of  one  or  both  velocity  components  to  be  measured  for  these 
particular  testa  with  a bias  error  between  -1.33  percent  and  +0.91  percent 
and  a +0.47  percent  random  uncertainty. 

Example  data:  The  example  testB  required  the  analysis  of  the  several 
million  velocity  measurements  acquired  at  several  hundred  points  in  the 
velocity  field  about  an  airfoil.  Figure  3 shows  a section  of  the  wing  at:  the 


center  span  and  the  directions,  labeled  Uj^  BUU  V^|  XII  WllXl.il  k(|*  WWW 

components  are  measured.  The  tall  of  each  arrow  in  figures  A and  5 represents 
a measurement  point.  At  each  measurement  point  several  hundred  (up  to  4096) 
individual  velocity  measurements  were  made  in  a period  that  varied  from 
10  seconds  to  several  minutes. 


in  which  ths  two 


DATA  INTERPRETATION  BY  HISTOGRAM 


Histogram  moments ; The  moat  elementary  compact  means  of  presenting  laser 
velocimeter  data  is  by  means  of  a histogram  of  ensembles  of  each  component  of 
velocity.  Figure  6 shove  four  pairs  of  histograms  measured  at  four  points 
above  the  airfoil.  The  ordinate,  C^,  is  the  percentage  of  measurements  that  lie 


between  - 


u +M 
12' 


In  this  case, 


was  2.56  m/sec. 
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Tne  histogram  shape  approximates  a probability  density  function,  P(U) . 
Thus,  is  approximately  P(U^)AU.  The  mean  of  the  probability  density 

function  is  equal  to  the  time-averaged  velocity,  that  is 


U 

a 


liffl  i 
T ■+  <o 


J U(t)dt  - 


U P(U)  dU 


The  histogram  mean  approximates  the  time-averaged  velocity  under  the 
following  assumptions: 


1.  The  true  velocity  probability  density  function  is  independent  of 
time  (i.e.  stationary  in  time). 


2.  The  laser  velocimeter  is  equally  likely  to  measure  all  velocities, 
or  else  any  velocity  bias  (Ref.  2)  has  been  removed  before  the  formation  of 
the  histogram.  Therefore,  this  Bource  of  error,  as  well  as  particle  tracking 
errors,  will  be  Ignored  in  this  discussion. 

3.  The  number  of  velocity  measurements,  D,  is  large.  The  statistical 
uncertainty  in  the  mean  for  a 95  percent  confidence  limit  is  given  by: 

Uncertainty  in  U - + — 

B " & 


where 


2 


0 


is  the  variance  of  the  histogram. 


In  a similar  manner,  the  standard  deviation,  0,  is  Identified  as  the 

root-mean-square  value  of  U(t)  - U . The  statistical  uncertainty  in  a for 

8 

a 95  percent  confidence  limit  is  given  by  (Ref,  3): 

Uncertainty  in  0-0 


1/1  (‘ + 1)' 


where  E is  the  excess  (or  kurtosis  - 3)  of  the  histogram.  The  uncertainty 
in  0 le  usually  an  order  of  magnitude  larger  than  the  random  error  in  the 
individual  velocity  measurement.  (The  random  error  in  individual  velocity 
measurements  was  discussed  in  section  II).  For  example,  if  6 is  3.50  m/sec 
and  the  excess  is  zero  and  if  2000  measurements  were  made,  then  the  uncertainty 
is  0.11  m/sec. 


Histogram  interpretation:  The  histograms  are  extremely  useful  in  the  physical 
interpretation  of  the  flow  field.  Figure  7 shows  contours  of  constant  resultant 
mean  velocities,  and  figure  6 shows  contours  of  constant  resultant  standard 
deviation.  The  aerodynamic  interpretation  of  these  contours  was  hindered  by  the 
complete  lack  of  any  time  history  or  frequency  information  in  the  histograms. 
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For  example,  the  cause  of  the  locus  of  maxlmums  of  standard  deviations  (shown  in 
figure  8 by  a dotted  line)  may  be  caused  by  (1)  high  levels  of  random  turbulence, 

(2)  a moving  continuous  vortex  sheet,  (3)  by  a series  of  discrete  vortices  that 
move  down  the  airfoil,  or  (4)  any  combination  of  the  above.  In  order  to 
distinguish  between  these  possibilities,  a Monte  Carlo  simulation  of  the  histograms 
was  used. 

, Monte  Carlo  simulation;  The  first  step  in  the  Monte  Carlo  simulation  is  the 
creation  of  a flow  model.  A vortex  model,  shown  in  figure  9,  was  hypothesized. 

By  adjusting  the  physical  parameters  (such  as  vortex  strength  and  height  above  I 

the  airfoil),  calculating  the  velocities  caused  by  the  vortex  modal,  and  1 

simulating  the  laser  velocimeter  measuring  process , a series  of  simulated  j 

histograms  were  generated.  Figure  10  shows  a comparison,  above  the  15  percent 

chord  of  the  airfoil,  of  the  actual  measured  pairs  of  histograms  along  with 

the  simulated  histograms  for  each  component.  The  Monte  Carlo  method  has 

qualitatively  reproduced  the  measured  bimodal  histograms.  The  simulation 

reproduces  the  high  velocities  in  both  components  at  point  "a."  Although 

points  "b"  through  "d"  have  low  measured  velocities,  the  simulation  does  not 

show  the  lower  velocities  before  points  "d"  and  "e."  Point  "f"  shows  good 

simulation  of  the  bimodal  histogram.  For  points  "g"  through  "j"  a gradual  shift 

to  a low  mean  velocity  is  reasonably  simulated.  Although  this  and  other  Monte 

Carlo  type  simulations  were  considered  to  be  successful,  the  hypothetical  flow 

model  cannot  be  accepted  with  complete  assurance  because  other  models  could  yield 

the  same  result.  The  Monte  Carlo  method  cannot  define  a unique  time  variation 

of  velocity.  This  is  one  of  several  severe  defects  In  the  present  method  of 

histogram  interpretation. 

Limitations  on  interpretation:  The  value  of  the  histograms  ia  augmented 
by  one  of  the  distinctive  characteristics  of  the  laser  velocimeter.  The  laser  i 

velocimeter  is  an  unusual  measurement  tool  in  that  the  errora,  both  random  and  i 

bias,  of  individual  measurements  are  calculable,  and  therefore  known,  before 
the  experiment  begins.  Since  several  hundred,  or  even  several  thousand,  indi-  ; 

vidua1  velocity  measurements  are  available  to  calculate  each  value  of  mean  j 

velocity  and  standard  deviation,  the  statistical  uncertainty  of  these  two 
histogram  moments  are  also  calculable.  However,  it  has  not  been  possible  to 
fully  utilize  the  advantages  of  precalculable  errors.  For  example,  the 
uncertainty  in  the  higher  moments  of  the  histogram,  skew  and  excess  have  not 
been  derived,  and  their  physical  significance  apart  from  indicating  large 
deviation  from  a Gaussian  shape,  is  not  readily  interpretable.  Also,  it  has 
not  been  possible  to  assign  any  quantitative  degree  of  certainty  to  the  histo- 
gram shape.  Thus,  no  numerical  measure  for  the  goodness-of-flt  of  the  Monte 
Carlo  simulation' and  the  measured  histogram  has  been  found. 

The  splitting  of  histograms  into  a steady  (mean)  part  and  unsteady  part 
is  a familiar  process.  Although  each  of  the  two  peaks  of  a bimodal  histogram 
represents  a flow  state,  there  is  no  analysis  available  to  separate  the 
unsteadiness  in  each  Btate  so  that  the  two  states  may  be  analyzed  separately.  j 

The  goal  of  such  an  analysis  should  be  a means  of  defining  the  set  of  time 
histories  that  may  reasonably  have  yielded  the  histogram.  j 

Extension  of  histogram  applicability:  The  difficulties  in  histogram  j 

interpretation  will  be  compounded  when  the  laser  velocimeter  is  structured  to 
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simultaneously  measure  both  velocity  components  so  that  pairs  of  components, 

(Uf,  Vj)  are  recorded.  The  type  of  histograms  that  might  result  from  this 

process  is  shown  in  figure  11.  Although  much  more  information  is  available  in 
the  two-dimensional  histograms,  the  type  of  analysis  needed  to  fully  utilize 
this  information  is  not  available. 

The  need  for  histogram  analysis  will  not  disappear  with  the  newer  laser- 
velocimeter  date-acquisition  systems  that  record  time  of  measurement.  The 
histogram  analysis  requires  at  least  an  order  of  magnitude  fewer  velocity 
measurements  and  measurement  rates  than  frequency  spectrum  representations. 
Conditional  sampling  techniques  yield  many  histograms.  Also,  the  histogram 
analysis  will  continue  to  be  used  for  online  confirmation  of  the  data  validity 
and  online  aelection.  An  optimization  of  data-acqulaltion  cost  may  eventually 
consist  of  histogram  representation  at  most  points  in  the  velocity  field  and 
selective  use  of  temporal  or  frequency  analyses. 

IV.  TIMED  VELOCITY  MEASUREMENTS.  In  order  to  analyze  velocity  data  by 
time-based  methods,  it  is  necessary  to  record  the  time  lapse,  or  interarrival 
time,  between  successive  velocity  measurements.  The  task  of  measuring  inter- 
arrlval  times  has  been  performed  by  a clock  with  three  ranges.  For  lntararrlval 
times  between  0.1  ysec  and  6.55  msec,  the  clock  has  a resolution  of  0.1  ysec. 
Using  automatic  ranging,  the  clock  measures  up  to  0.655  sec  with  a resolution 
of  1 ysec  and  up  to  0.655  sec  with  10  ysec  resolution.  The  typical  time  scales 
for  large-scale  wind-tunnel  power  spectrum  measurements  are  shown  in  Table  I. 


TABLE  I.-  TYPICAL  TIME  SCALES  SUITABLE  FOR  POWER  SPECTRA 


Maximum  resolution  of  the  Interarrival  clock  0.1  ysec 

Instrument  reset  time  between  measurements  ....  0.4  ysec 

Time  required  for  one  velocity  to  be  measured 2 ysec 

Residence  time  of  a particle  in  sample  volume 2 to  20  ysec 

Average  particle  arrival  time  (T/D) 0.5  to  2 msec 

Maximum  interarrival  time  measurable  by  clock  0.655  sec 

Measurement  period,  T 2 to  100  sec 
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Since  existing  Instrumentation  is  capable  of  making  a velocity  measure- 
ment about  every  2.4  ysec  (this  depends  on  particle  velocity,  Bragg  csll 
frequency,  and  fringe  spacing),  the  limiting  factor  on  data  rata  is  the  rate 
at  which  particles  pass  through  the  sample  volume.  Since  each  particle  must 
pass  through  10  fringe  planes  (spaced,  in  the  second  test,  26.5  ym  apart)  in 
order  to  register  a velocity  measurement,  and  since  the  planes  are  moving  in 
the  measurement  direction  at  a speed  governed  by  the  Bragg  call  frequency 
(e.g. , 132  m/sec),  the  time  required  for  one  velocity  measurement  is  the  reset 
time  plus  (10  x fringe  epeclng)/(Bregg  velocity  + measured  particle  velocity). 
The  everege  particle  interarrival  time  depends  on  the  Average  flow  velocity, 
the  diameter  of  the  sample  volume  (e.g.,  0.314  taa),  and  the  density  of 
particles  of  measurable  size.  Although  the  minimum  time  between  measurements 
will  vary  from  2.4  ysec  for  various  test  conditions,  it  is  unlikely  to  be  a 
restrictive  factor  in  the  date  analysis.  This  can  be  demonstrated  by  e 
comparison  to  the  aver Age  data  rate. 
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The  arrival  of  particles  through  the  sample  volume  has  been  found  to  approxi- 
mate a Poisson  distribution  in  time  (Ref.  4).  This  distribution  takes  the  form 

p(n;AT)  - " n - 0,  1,  2 . . . 

where  X is  the  mean  particle  arrival  rate.  In  an  experimental  test  case,  where 
X was  317  measurements  per  second,  it  was  found  that  the  limitations  imposed  by 
the  system,  minimum  interarrival  time  of  2.4uaec  and  the  maximum  interar rival 
time  of  .6<'R  second,  pose  no  limitations  on  the  measurement  of  interarrival  times, 
figure  12. 

V.  POWER  SPECTRA.  The  best  developed  method  of  presentation  of  the  time 
dependence  in  unsteady  flows  is  power  spectra.  The  most  accurate  calculation 
method  that  has  been  found  to  use  the  laser  velocimetry  measurements  for  power 
spectra  is  an  indirect  method.  The  first  step  is  the  calculation  of  a weighted 
estimate  of  the  autocovariance.  In  order  to  apply  e fast  Fourier  transform  to 
obtain  the  power  spectra,  the  autocovariance  is  extended  to  form  an  even  function. 
This  method  has  been  selected  over  Fourier  series  methods  and  periodgram  methods 
entirely  on  a trial  and  error  basis  (Ref.  4),  Its  superiority  has  not  been 
established  analytically,  and  there  is  little  understanding  of  the  reasons  for 
the  smaller  errors  that  result  from  the  autocovariance  approach. 

Formation  of  the  autocovariance:  The  autocovariance  estimate  C(kAr)  for 
k ■ 1 . . . K is  based  on  a minimum  lag  time  At.  The  value  of  At  must  be 
greater  than  the  resolution  of  the  interarrival  clock.  However,  much  larger 
values  are  required  to  avoid  excessive  errors.  Of  course,  KAt  must  not  exceed 
the  measurement  period,  T. 

These  limitations  on  the  choice  of  At  and  K are  the  same  as  they  are  In 
determining  the  autocovariance  function  of  a uniformally  sampled  data  set  from  a 
continuous  signal.  That  is,  the  frequency  resolution  is  determined  by  Af  • 1/2KAt 
where  the  maximum  (possible  lag)  value  of  KAt  1b  the  total  measurement  time,  T, 
and  the  maximum  frequency  is  limited  by  f ■ 1/2At  where  At  is  the  minimum 
time  between  samples.  For  the  system  under  consideration,  the  minimum  possible 
At  Is  2.4ysec  and  the  maximum  value  of  KAt  is  related  to  the  average  arrival 
rate  by  KAt  '■  4096/X  where  the  value  4096  is  the  maximum  number  of  measurements 
that  can  be  stored  in  the  memory  buffer  and  X is  the  mean  data  sample  rate  (mean 
particle  arrival  rate).  However  in  a practical  random  sampling  situation,  the 
choice  of  K and  At  should  be  made  with  regard  to  the  following:  1)  desired 
frequency  resolution  end  resulting  variability  error,  and  2)  the  value  of  At 
must  be  chosen  so  that  K is  fixed  at  512  due  to  data  processing  limitations.  If 
the  chosen  value  of  At  is  found  to  be  undesirable,  the  fact  that  the  data  is 
sampled  randomly  in  time  allows  another  choice  of  At  to  be  made  without  repeating 
the  experiment. 

To  obtain  the  autocovariance,  the  mean  velocity  is  subtracted  from  each 
measurement  to  give  velocity  values  U. , i ■ 1 . . . D.  The  time  delay  between 
two  velocity  values  and  has  Been  recorded  and  is  denoted  as  t^  «*  t^. 

The  lag  product,  Ay(k)  Is  defined  as 

D D 

A (k)  - E E U.U.SKt,  - t.  - (k  - ^)At)((U  + ?j)At  - t + tj] 
u i-1  2-i 
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where  S(x)  - 1 for  x > 0 
- 0 for  x < 0 


Thus  only  those  velocities  whose  Interarrivel  times,  t^  - t„,  lie  between 
(k  ■■  .5) At  and  (k  + .5) At  contribute  to  the  lag  product. 

For  efficient  computation,  each  possible  pair  of  velocities  is  examined; 

Ci  ” 

the  leg  time  ratio,  ^ — , la  calculated;  and  no  action  is  taken  if  for 

that  pair  the  time  lag  ratio  is  not  leas  than  R.  Otherwise  the  product 

is  added  to  the  kth  location  (where  k is  the  integer  nearest  the  leg 

time  ratio)  of  the  array  Au(k)  and  kth  location  in  array  H(k)  is  Incremented 

by  one.  The  accumulations  of  Au(k)  and  H(k)  found  for  several  separate 

periods  (e.g.,  lots  of  4096  velocity  measurements)  of  measurement  may  be  summed. 
The  autocovariance  is  then  formed  as 

A (k) 

C(kAt)  - for  k - 1 . . . K 
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This  has  been  shown  to  be  an  unbiased  estimate  (Ref.  5)  if  the  true  mean 
velocity  has  been  subtracted  from  the  data.  The  variance  of  C(kAt)  1b, 
under  very  restrictive  assumptions, 


o 


2 

k 


gA  + C2(kAT) 
H(k) 


2 

where  0 is  the  variance  of  the  velocity  data.  No  arror  estimate  ia  avail- 
able for  spectra  that  do  not  deacrlba  a atationary  broadband  Gauaaian  procaaa. 
The  value  of  C(0)  may  be  calculated  by  the  above  scheme  or  more  simply  as 


C(0) 


a 
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Power  spectra  results:  The  power  spectrum  ia  tha  Fourier  transform  of  the 
autocovariance.  Figure  13  shows  results  obtainad  by  calculating  a 312  bln 
autocovariance  array,  and  then  defining  an  additional  512  bins  so  that  an 
even  function  is  formed.  This  allows  a fast  Fourier  transform  using  Bartlett 
(triangular)  window  and  a frequency  resolution  of  Af  - 1/2KAt  to  be  used. 

Figure  13(c)  is  a power  spectrum  of  tha  V,  velocity  component  of  the 
circled  point  in  figure  7.  About  40,000  measurements,  taken  at  an  average  data 
rata  of  about  400  measurements  per  second,  wore  used.  The  minimum  lag  time, 

At,  for  the  autocovariance  was  0.5  msec  and  512  values  of  lag  time  wera 
calculated.  These  data  were  taken  in  10  lots  with  about  4000  measurements 
in  each. 
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' Figure  13(a)  shows  the  distribution  of  number  of  lag  products,  H(At). 

The  histogram  bin  size  is  0.5  msec.  The  autocovarisnee  is  shown  in  figure  13(b). 
[j  The  spectrum  is  displayed  in  figure  13(c)  up  to  the  maximum  calculated  frequency 

i of  1 kHz.  The  reason  for  the  negative  values  that  occur  above  280  Hz  is  not 

i known.  The  negative  values  persisted  for  the  recalculation  of  the  power  spectrum 

for  a doubled  (1  msec)  minimum  lag  time  (Fig,  13(d)).  Since  negative  power  ia 
undefined,  the  aource  of  the  anomalous  behavior  above  280  Hz  must  be  an  error. 

;j  At  this  particular  data  rate,  for  this  2 Hz  frequency  resolution,  and  for 

40,000  measurements,  a satisfactory  spectrum  has  apparently  been  calculated 
up  to  280  Hz  although  attempts  to  calculate  this  spectrum  with  only  4000 
:|  measurements  gave  very  inconsistent  curves,  There  ia,  unfortunately,  no  general 

analysis  of  the  error  in  the  spectra,  no  theory  that  explains  how  the  reeearchar 
may  compensate  for  a low  data  rate  by  increasing  the  number  of  measurements,  and 
especially  no  means  of  calculating  the  effects  of  the  known  random  measurement 
! errors  in  velocity  and  time, 

' Frequency  limitation  on  spectra;.  Work  1b  proceeding,  on  an  experimental 

baels,  on  the  maximum  frequency  limitation.  Because  of  the  random  interarrival 
times  of  the  velocity  measurements,  the  Nyquist  criterion  does  not  apply  to  the 
average  data  rate.  Theoretically,  a maximum  frequency  far  exceeding  the  Nyqulat 
criterion  could  be  achieved.  In  simulations  (Ref.  4)  and  experiments,  attempts 
to  exceed  twice  the  Nyquist  criterion  frequency  have  led  to  erratic  spectra. 

! A method  of  predicting  the  data  rate,  number  of  samples,  and  error  bounds 

necessary  to  achieve  a desired  frequency  limit  and  accuracy  ia  needed. 


Although  satisfactory  power  spectra  have  been  obtained  by  the  indirect 
method  using  an  autocovariance  estimate,  it  is  poBBlble  that  an  improved 
technique  could  be  devised, 

VI.  ANALYSIS  OF  PERIODIC  PHENOMENA.  As  research  into  the  fluid  mechanics 
of  turbulent  flow  has  progressed,  more  patterns  have  been  discerned  in  flow 
fields  traditionally  considered  to  be  random  variations  in  velocity  (Ref.  6). 

The  aerodynamlclst  must  also  analyze  patterned  or  organized  flow  even  in  the 
presence  of  random  or  broadband  velocity  fluctuations  (Ref.  7).  Flows  such  as 
thosa  beneath  a helicopter  rotor  (Ref.  8)  are  difficult  to  break  into  threa 
categories  as  suggested  in  reference  9:  mean  flow,  organized  or  patterned  or 
repetitive  velocities,  and  random  velocity  fluctuations. 

The  spectrum  in  figure  14  is  taken  from  a heuristic  test  in  a water  tunnel. 
An  oscillating  vane  imposed  a discrete  frequency  oscillation  on  the  axial  flow 
velocity.  The  spectra  alone  is  sufficient  to  relate  the  velocity  response  to 
the  vane  oscillation  to  the  magnitude  of  the  random  turbulence  and  confirm 
the  absence  of  higher  harmonics. 
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The  spectrum  is  not  sufficient  to  define  the  phase  angle  between  the  flow 
and  vane  oscillation.  This  information  can  be  obtained  by  a conditional 
sampling  technique.  The  preferred  method  is  to  record,  at  the  time  of  each 
velocity  measurement,  the  vane  angle.  A plot  of  velocity  versus  vane  angle 
will  reveal  the  phase,  the  waveform  of  the  response,  and  the  variance  of  the 
response  at  each  vane  angle. 
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A similar  technique  has  been  applied  to  a rotor  tip  vortex  flow  (ref,  10) 
and  will  be  applied  to  an  oscillating  airfoil  test  now  under  construction.  The 
airfoil  is  large  enough  (66  cm  in  chord)  to  contain  a small  laser  velocimeter. 
Figures  15  and  16  show  the  airfoil  mounted  in  Langley  Transonic  Dynamics  Tunnel. 
As  the  airfoil  oscillates,  the  flow  velocity  with  respect  to  the  airfoil- 
fixad  coordinates  and  roughly  parallel  to  surface  will  be  measured.  This 
velocity  is  expected  to  be  highly  nonalnusoidal  due  to  presence  of  shock  waves. 
Therefore,  the  primary  data  reduction  technique  will  be  conditional  sampling 
based  on  the  airfoil  angle.  Other  techniques  could  be  devised  (such  as  basing 
the  condition  on  the  leading  edge  pressure  drop  as  sero  time  and  plotting  against 
time).  Each  experimental  setup  will  contain  unique  conditional  sampling  tech- 
niques but  each  will  share  three  common  demands  on  the  data  acquisition  and 
reduction  process. 

1.  Each  velocity  will  be  linked  to  a condition  of  some  measurement. 

2.  Velocities  must  be  sorted  into  bins  on  the  basis  of  the  condition 
measurement. 

3.  Each  bln  must  be  analysed  by  the  techniques  developed  for  histograms. 

The  use  of  current  and  planned  methods  of  data  analysis  of  laser  velocimeter 
measurements  will  allow  the  aerodynamicist  to  investigate  flow  fields  that  have 
not  been  amenable  to  probe  investigation.  Such  complex  flow  fields  as  helicopter 
rotor  wakes,  separated  and  recirculating  flow  on  airfoils,  and  transition  from 
laminar  to  turbulent  flow  are  especially  likely  candidates  for  laser  velocimeter 
measurements.  Because  of  the  continuing  need  for  evermore  penetrating  analysis 
of  experimental  data  and  because  of  random  phenomena  that  occur  in  each  of 
these  flows,  a need  will  arise  for  the  reduction  of  laser  velocimeter  data  by 
such  techniques  as  temporal  and  spatial  cross-correlation  of  two  velocities, 
time  history  reconstruction,  moving  block  analyses,  and  pattern  recognition. 

The  potentialities  and  difficulties  of  the  more  refined  data-analysis 
techniques  will  become  more  apparent  as  deeper  understanding  of  conditional 
sampling  and  power  spectrum  technique  is  gained.  However,  these  two  techniques 
are  clearly  not  the  end  point  of  laser  velocimeter  data  analysis. 

VII.  CONCLUSIONS.  The  rapid  development  of  the  laser  velocimeter  as  a 
routine  tool  for  flow  measurement  in  large  wind  tunnels  haa  given  rise  to  new 
demands  for  data  interpretation  end  analysis.  The  distinctive  characteristics 
of  laser-veloclmeter  systems  with  respect  to  the  traditional  flow  measurement 
systems  are  primarily  the  following: 

1.  Rapid  acquisition  of  thousands  of  individual,  digital  velocity 
measurements  is  possible  at  data  rates  limited,  at  present,  only  by  the 
capacity  of  the  wind-tunnel  seed-particle  injection  process. 

2.  The  seed  velocity  measurement  errors  are  not  only  small  but  they  are 
predictable  before  the  experiment  is  begun. 

3.  The  time  between  measurements  is  a random  variable. 
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These  distinctive  characteristics  offer  opportunity  for  more  precise  control 
of  errors  and  more  efficient  and  more  complete  analysis  of  time-dependent 
data  end  real-time  data  analysis.  To  achieve  theae  advantages  much  work 
remains  to  bs  done  on  both  the  existing  methods  of  analysis,  such  as  histogram 
displays  and  power  spectra,  and  on  methods  now  being  developed  such  as 
two-component  histograms,  conditional  sampling,  and  cross-correlation. 

Histograms  ars  the  most  efficient  means  of  data  interpretation  because 
of  much  lower  requirements  for  the  amount  of  data  and  data  rate.  Batter 
means  of  quantising  histogram  shape  and  errors  in  shspe  are  needed,  especially 
for  bimodal  and  other  highly  non -Gaussian  shapes.  Even  e shape  classification 
to  guide  the  present  cumbersome  Monte  Cerlo  methods  would  be  helpful.  The 
use  of  histograms  in  the  conditional  sampling  process  will  compound  the  need 
for  these  analytlcel  tools. 

Also  urgently  needed  are  better  error  analyses  for  power  spectra.  Any 
optimisation  of  the  calculstion  process  would  be  very  useful. 

The  problems  that  will  be  presented  by  the  untried  data  reduction 
techniques  are  not  as  clearly  defined  as  for  the  histogram  and  power  spectra. 
These  problems  may  include  the  reconstruction  of  time  histories  from  data 
with  random  interarrival  times,  error  analysis  of  cross-correlations  using 
noncolncldent  (in  time)  measurements  of  the  two  velocity  components,  and 
sufficiency  conditions  for  moving  block  analysis. 
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Figure  1.—  Laser  velocinseter  meas ureaents  in  the  Langley  high-speed  7-  by  10— foot  wind  tunnel. 


Figure  2.—  Laser  velociaeter  Measurements  in  the  Langley  V/STOL  wind  tunnel. 


Figure  5.-  Mean  velocities  above  a stalled  wing 


Figure  7.-  Contours  of  constant  resultant  mean  velocity  normalized 
by  the  freestream  velocity. 


angle 


Figure  9.-  Model  of  vortex  stress, 


b:  x * 0.1102  m;  y«  0.0894  m 


Figure  10.-  Measured  (left  2 columns)  end  simulated  (right  2 columns) 

histograms  above  transducer  7. 
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VL,  ml  sec 


U^,  m/sec  VL,  rn/sec  UL,  m/sec 

J:  x ■ 0.1178  m;  y ■ 0.0555  m 

Figure  10.-  Concluded. 
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Auto covariance , C(At),  «2/sec2  Hu,4*er  of  la«  Products,  H(At 


Lag  time,  sac 


(a)  Histogram  of  number  of  lag  products 


m r 


Lag  time,  At,  sac 
(b)  Autocovariance. 


Figure  13.  - Calculation  of  power  spectrum  for  stalled  airfoil  at 
Hach  number  0.13. 


Amplitude,  ■ /sec 


8 x 10' 


m 


7t iffl lt3 Ztfl — 

Lag  time,  At,  sec 

(a)  Hiatogran  of  number  of  lag  produrta. 


Lag  time,  At,  sec 

(b)  Autocovariance. 


Figure  14.  - 


TO  40  60  BO  100 

Frequency,  Hz 

(c)  Power  Bpectrun  of  velocity. 

Calculation  of  power  spectrum  from  water  tunnel  with 
oscillating  vane. 
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Figure  15.-  Pitch  Rig  installed  in  the  Langley  Transonic  Dynanics  Tunnel. 
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RESOLVING  UNDER-IDENTIFICATION  THROUGH  REPLICATION 
IN  STRUCTURAL  EXPERIMENTAL  DESIGN 


William  S.  Malllos 
8DM  Services  Co. 


ABSTRACT:  Application  of  structural  regression  to  experimental 

design  often  results  in  under-identification.  A remedy,  albeit 
unrealistic,  Is  to  assume  the  structural  system  Is  diagonally 
recursive.  Reexamination  of  this  assumption  leads  to  a measure 
of  the  degree  to  which  structure  has  been  resolved  In  a non-recur- 
slve  system,  assuming  Identification.  To  assure  identification, 
the  experiment  should  be  replicated  with  one  replication  used  as 
an  Instrumental  variable  for  the  other. 


1.  INTRODUCTION. 


In  the  structural  regression  system 
A X " r x + 6.  , 


(1.1) 


£ Is  a p x 1 vector  of  endogenous  variables,  x Is  a q x 1 vector  of 
exogenous  variables,  A(p  x p)  ■ (o  ^*)  is  the  direct  effect^  of  yh* 
on  yh,  « 1 , r (p  x q)  - (Yh1)»  Is  the  direct  effect  of  x^  on 
yh,  and  ^(p  x 1)  Is  the  model  error  vector.  Assume  that  E (6_)  ■ 0 and 

^ee  [5]  for  definitions  of  direct,  Indirect,  and  overall  effects. 


t ■ . 


E (.6  6.')  = l.e.,  jS  : (0,  Z(5),  where  z6  is  non-singular  and  contains 

finite  diagonal  elements.  Assuming  | A | t 0 and  premultiplying  (1.1) 


by  A“  yields  the  reduced  regression  system 

y = A-1rx  + A'1  & « Bx  + e , (1 .2) 

where  B (p  x q)  = ( 6h1>*  Bhi  is  the  overall  effect  on  yh,  and 
£ : (0,  E),  E « A"1Ei  A'-1. 

Let  y^  and  x^  denote  n x 1 vectors  of  observations  on  y^  and  x. . 
Then  the  h-th  model  of  (1.1)  is  written  as 

m 

»h  ’ Vh  Sh  + Xh*h  + 4 ■ zh4  + St,-  • C.3) 

where  Yh(n  x ph)  - (y^*),  Xh(n  x qh)  ■ (x^),  Zh  - (Yh|  Xh),  and 

i'h  “ (a'fj  X1  h^ * *n  denotes  the  n x n identity  matrix  In  the 

2 

assumption  5^  : (0.,  In  ) . 

Letting  X denote  the  n x q matrix  of  all  controlled  variables, 
the  ordinary  least  squares  (OLS)  estimate  of  B is 


B - (X'XTVy. 


r is  estimated  by 


S - (Y  - XB) 1 (Y  - XB)/(n  - q). 


(1.4) 


0.5) 


Assuming  Identification  [2,  3,  5],  and  can  be  estimated  Indirectly 

* -1 

through  the  reduced  system  by  equating  B to  3 * A and  S to  £ » 
-1-1  9 

A A . Alternatively  , 9^  can  be  estimated  directly  through  two 
stage  least  squares  (2SLS)  estimation  as  follows: 


‘See  [2,  3]  for  other  estimation  techniques. 
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(1.6) 


Sh  s Uzh  X)(X*X)_1  (X'Zb)]'1  (Z,;  XHX'X)-^'^; 

var  * C(Z^X)(X'X)‘1(X,Zh)]"1  of 

where  o*  Is  estimated  by  s*  * - Z^'ty,  - Zhe^)/(n  - ph  - qh). 

This  estimate  derives  its  name  from  a conceptual,  two-fold  application 
of  OLS;  l.e..  In  the  reduced  regression  for  Yh,  giver  by  E( Yh ) » X B^, 

A A 

Bh  Is  estimated  by  (1.4);  after  replacing  Y^  by  Yh  ■ X Bh  In  (1.3), 

CIS  Is  applied  a second  time  which  leads  to  the  result  1 r.  (1.6);  In 
this  process,  the  uncontrolled  Y^  has  been  replaced  by  a consistent 

A 

estimate,  Y^,  which  Is  treated  statistically  as  if  It  were  controlled; 
see  [3,  p.  153]. 


2.  COMMENTS  ON  THE  ASSUMPTION  OF  A DIAGONAL  Z .. 

c 

In  (1.3)  a natural  question  Is  regarding  the  appropriateness  of 
estimating  by  (Z^r^  the  OLS  estimate.  When  A Is  triangular 
and  Is  diagonal,  the  structural  system  Is  termed  diagonally  recursive 
[2,  4,  5]  and  the  OLS  estimate  is  consistent.  Regarding  the  plausibility 
of  the  diagonally  recursive  assumption,  a triangular  A might  be  realistic 
so  long  as  the  experiment  Is  designed  with  this  restriction  In  mind; 
l.e.,  a triangular  A Implies  that,  during  the  course  of  the  experl- 
ment,  no  causal  feedbacks  occur  between  any  two  yh  and  yh*  and  that 
no  variable  has  an  Indirect  effect  on  Itself  [5],  However,  an  assumed 
diagonal  25  has  far  reaching  Implications.  Under  a diagonal  z6,  extraneous 
variables  (EVs)  comprising  are  independent  from  one  model  error  to  the 
next;  l.e..  If  uh  and  u^*  denote  any  two  EVs  making  up  a component  part 
of  and  respectively,  then  a diagonal  z6  Implies  that  E(u^uh*)  ■ 0; 
otherwise,  E(uhuh*)  f 0,  E(uh6h)  t 0,  and  E(uh*6h)  f 0 together  Imply 
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that  £(5^*)  = 0 which  violates  the  diagonal  E assumption.  Thus, 
aside  from  the  structural  models  comprising  the  system,  ho  other 

i 

relevant  models  are  associated  with  the  experimental  unit  under  a i 

diagonal  r{.  For  if  there  were,  they  would  be  d'eflned  by  relations  j /. 

among  EVs  comprising  model  errors.  But  since  E(uhuh*)  ■ 0 for 
h f h*.  there  can  be  no  relations  among  these  . thus  provides  a 
quantitative  measure  of  structure  resolution. 

POSTULATE:  Total  ignorance  regarding  structure  occurs  when  struc- 
tural parameters  are  under-identified  and  a reduced  regression 
analysis  Is  the  only  recourse.  Total  resolution  of  structure  (relative 
to  a well  defined  experimental  unit)  Is  characterized  by  a diagonal 

4 

which  Is  validated  through  experimentation.  The  degree  of  structure 
resolution  Is  quantified  on  a [0,  1]  scale  In  terms  of  an  estimate  of 
| R | , where  R is  the  p x p matrix  of  model  error  correlations. 

c 

Note  that  the  "Invariance"  of  the  reduced  regression  under  what- 
ever the  hypothesized  causal  scheme  provides  complete  objectivity 
but  total  Ignorance  regarding  structure.  However,  assuming  Identifi- 
cation, lack  of  this  type  of  objectivity  is  no  reason  to  reject  structural 
regression.  When  two  experlmentors  propose  different  causal  schemes  for 
the  same  set  of  data  and  the  matter  Is  not  resolved  In  the  ensuing  analysis, 
continued  experimentation  will  validate  one  or  the  other  or  reject  both. 


^Relevant  structural  relations  among  EVs  contained  wholly  within  one 
particular  5h  would  likely  Indicate  that  E(su)  t 0 and/or  that  the 
experimental nunit  needs  redefinition.  n 

^See  [1,  p.  260]  for  a test  of  the  hypothesis  that  t.  Is  diagonal. 

^Invariance  Is  used  In  the  sense  that  A and  runlquely  determine  B 
In  (1.2)  though  not  conversely.  Thus,  and  Infinity  of  (A,  r) 
structures  could  lead  to  the  same  B. 
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Consider,  for  example,  a sugar  beet  experiment  [5,  p.  816], 
where  the  stand  of  the  crop  was  found  to  have  a positive,  direct 
effect  on  yield  (y^).  The  estimated  model  error  correlation  was 
- .45  (hence,  estimated  |R|  ■ .80)  which  led  to  a conjecture  of  food 
competition  between  plants;  l.e.,  apart  from  the  average  positive 
effect  of  stand  on  yield,  a stand  response  above  Its  expectation 
would  tend  to  accompany  a yield  response  below  Its  expectation  due 
to  the  greater  competition  by  plants  for  food.  The  Implication  of 
this  correlation  Is  that  additional  structural  relations  remain  to 
be  hypothesized  In  future  experiments  and  that  these  relations  might 
Involve  measures  of  moisture  content  and  plant  food.  Unfortunately, 
under-ldentlflcatlon  Is  generally  the  case  In  experimental  design  so 
that  attention  Is  redirected  to  methods  of  achieving  Identification. 
3.  DESIGNING  THE  EXPERIMENT  TO  REMEDY  UNDER-IDENTIFICATION 


In  a 


Consider  the  following  two  model  system  describing  an  experiment 
completely  randomized  design: 

yl  * »0  * T1  + “12  y2  + S1 


(3.1) 


*2  " VZ  + T2  + a21  yl  + 62  • 


where  ^ and  denote  mean  and  direct  treatment  effects,  respectively. 
Since  q <ph  + qh,  additional  Information  Is  necessary  to  estimate 
structural  parameters.  For  example,  the  reduced  model  errors  corres- 
ponding to  (3.1)  are  ■ (^  + 0]g  $2)  / 0 - a-; 2a21  ^ ancl  e2  “ 

(Sg  + <*21  <$1  )/(l  - a] 2a21 ) ' whereupon,  from  1 • A"1!:^'"1, 


1 

;; 

■5 
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^ + 2a12a5162  + a12  °62’  °6162(1  + a12“21}  + *21  \ * °12  °«2| 


°62  * 2o‘21°S162  a21  °61 


(l-°i12a2i ) 


Equating  z to  S In  (1.5)  yields  three  equations  in  five  unknowns.  If, 
however,  there  is  no  causal  feedback  with  only  a direct  effect  of  y2 
on  Yl  (l.e.,  a2)  - 0)  and  the  ratio  X2  - a2^  Is  known,  then  we 
have  three  equations  In  three  unknowns,  and  »12  and  p - o4  ^ ^ 

are  estimated  by 

a12  - 1 - x2s22sj12  and  r - Xs22s^2  + (xs^sjJ)  - x"1 . 

Since  knowledge  of  <*21  and  x Is  not  often  available,  another 
recource  Is  to  assume  that  (3.1)  applies  to  the  first  replication  of 
the  experiment  and  that 

>h  * uh  + Th  + “hh*yh*  + Sh  (3,2) 

applies  to  the  second  replication.  Note  that  while  »hh*  and  treatment 
effects  remain  the  same  between  replications,  the  block  effects  may 
differ.  Moreover,  It  will  generally  be  the  case  that  E($^*)  " 0 for 
h,  h*  - 1,  2.  Subtracting  y^  from  in  (3.1)  and  (3.2)  yields 

* K * t»h  - wh>  + °hh*<V  - yh*>  + <*h  • SK>  (3'3) 

Consider,  first,  estimation  of  parameters  of  the  first  model 
in  (3.1).  Using  the  result  In  (3.3),  (3.1)  can  be  replaced  by 


218 


I 


y1  * P1  + T1  + “12  y2  + 51 

y2  - y£ + (**2  ’ ^ + “2i  (yi  - yi' ) + ( 62  " ^2) 


(3.4) 


Taking  the  y^  ^ controlled  variables  amounts  to  replicating  the 
experiment  and  using  one  replication  as  an  Instrumental  variable 
[2,  3,  5]  for  the  other6.  All  parameters  In  (3.2)  are  Identified 
as  Is  made  obvious  by  referring  to  the  corresponding  reduced  system, 
given  as  follows: 


yl 


|"1  + °12  <"2  - *‘2>1  + “12  y2  - 2al 2 H + T1 


+ fij  + <»12  (fi2  - 4J) 


/* 


where  1 - a^c^.  It  Is  clear  that  structural  coefficients  are 
over-identified. 

As  for  estimating  the  tg,  (3.1)  Is  replaced  by 


y!  * yj  + (m<|  - nj)  + «12  (y2  * y£)  + (6i  - 6i) 

y2  “ VZ  + t2  + “21  yl  + *2 
and  2SLS  can  be  applied  directly  as  In  (3.4). 

There  Is  a price  to  be  paid  In  using  the  replication  method  to 
produce  Identification.  Firstly,  the  sample  size  Is  halved  which 


6 In  the  same  manner  that  lags  are  used  In  econometric  models,  one 
replication  can  be  considered  as  a lag  for  the  other  replication. 
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reduces  power.  Secondly,  If  yh  and  yh  (the  consistent  estimate  of 
y^  obtained  through  the  reduced  system) are  net  highly  correlated, 
the  resulting  structural  estimates  may  be  highly  Inefficient.  How- 
ever, the  alternatives  are  (1)  complete  reliance  on  a reduced 
analysis  (which  should  always  accompany  and  complement  a structural 
analysis)  and  (2)  OLS  estimation  which  generally  leads  to  Inconsistent 
estimates,  but  which  may  provide  certain  estimates  with  low  mean 
square  error. 
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THE  SAMUEL  S.  WILKS  MEMORIAL  MEDAL 
BANQUET  REMARKS 

Frank  E.  Grubbs,  Program  Chairman  of  the  Conference 


The  twenty- thl rd  year  or  occasion  for  the  Design  of  Experiments 
Conference  In  Anny  Research,  Development  and  Testing  marks  another  very 
significant  milestone  for  Statistical  Methods  In  the  Army.  Each  year 
I like  to  reflect  back  over  previous  conferences,  and  It  Is  easy  to 
see  how  much  we  owe  a great  debt  to  the  memory  of  Sam  Wilks  for  his 
vision  In  getting  Army  statisticians  together  on  a yearly  basis  for 
the  common  good  of  all.  Indeed,  we  continue  to  benefit  considerably 
from  our  previous  22  conferences,  which  have  promoted  much  good 
statistical  work  In  the  US  Army.  Don't  you  agree?  The  associations 
with  our  statistical  friends  from  the  universities  have  kept  us  up 
to  date  and  provided  much  stimulus  toward  many  timely  accomplishments. 
These  conferences  have  done  a lot  of  good  by  simply  getting  us  all 
together  on  problems  of  common  Interest  and  we  cover  so  many  fields 
of  Interest!  Again,  I am  reminded  we  have  not  stuck  to  the  title, 

"Design  of  Experiments",  In  all  detail,  but  that  Is  good  as  the  field 
of  statistical  topics  changes  fast  and  we  must  always  move  on  to  new 
things  or  areas.  I could  go  on  and  on  concerning  the  good  these 
conferences  have  accomplished,  but  1 must  mention  that  the  success 
of  these  conferences  would  not  have  been  so  great  were  It  not  for  our  > 
most  dedicated  friend,  Francis  Dressel,  who  as  we  all  know  again 
deserves  a vote  of  thanks  at  this  time,  for  his  effective,  continuing 
contributions  (so  sorry  he  couldn't  make  It  this  year.)  Also,  this 
Is  the  first  time  we  have  been  privileged  to  have  our  conference  here 
at  Monterey  and  we  appreciate  such  nice  facilities,  and  also  Doug  Tang, 
Wally  Foster  and  Bob  Launer  are  to  be  thanked  for  the  very  significant 
part  they  played  again  this  year. 

We  now  turn  to  the  Samuel  S.  Wilks  Memorial  Medal. 

The  Samuel  S.  Wilks  Memorial  Medal  Award,  Initiated  jointly  In 
1964  by  the  US  Army  and  the  American  Statistical  Association,  Is 
administered  for  the  Army  by  the  American  Statistical  Association, 
a non-profit,  educational  and  scientific  society  founded  138  years 
ago  In  1839.  The  Wilks  Award  Is  given  each  year  to  a statistician  - 
often  a good  one!  - and  Is  based  primarily  on  his  contributions  to 
the  advancement  of  scientific  or  technical  knowledge  In  Army  statistics, 
Ingenious  application  of  such  knowledge,  or  successful  activity  In  the 
fostering  of  cooperative  scientific  matter  which  coincidentally  benefit 
j the  Army,  the  Department  of  Defense,  the  US  Government,  and  our  country 

[ generally*  The  Award  consists  of  a medal,  with  a profile  of  Professor 

* Wilks  and  the  name  of  the  Award  on  one  side,  the  seal  of  the  American 

(Statistical  Association  and  name  of  the  recipient  on  the  reverse,  and 
a citation  and  honorarium  related  to  the  magnitude  of  the  Award  funds, 
which  were  donated  by  Philip  W.  Rust  of  the  Wlnnstead  Plantation, 
Thomasvllle,  Georgia.  The  Annual  Army  Design  of  Experiments  Conferences, 
at  which  the  Wilks  Medal  Is  given  each  year,  are  sponsored  by  the  Army 
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Mathematics  Steering  Committee  on  behalf  of  the  Office  of  the  Chief 
of  Research  and  Development  and  Acquisition,  Department  of  the  Army. 

Previous  recipients  of  the  Samuel  S.  Wilks  Memorial  Medal  Include 
John  W.  Tukey  of  Princeton  University  (1965).  Major  General  Leslie 
E.  Simon  (1966).  William  G.  Cochran  of  Harvard  University  (1967). 

Jerzy  Neyman  of  the  University  of  California,  Berkeley  (1968),  Jack 
Youden  (1969)  formerly  of  the  National  Bureau  of  Standards,  George 
W.  Snedecor  (1970)  formerly  of  Iowa  State  University,  Harold  Dodge 
(1971)  formerly  of  the  Bell  Telephone  Laboratories,  George  E.  P.  Box 
of  the  University  of  Wisconsin  (1972)  - and  with  us  today.  H.  0.  (HO) 
Hartley  of  Texas  ASM  University  (1973)  - and  our  keynote  speaker, 

Cuthbert  Daniel  (i?7l)  - private  statistical  consultant,  Herbert  Solemn 
of  Stanford  University  (1975)  - who  just  trekked  to  the  United  Kingdom 
for  two  years  wuh  ONR,  and  Solomon  Kullback  of  George  Washington 
University 

This  brings  us  up  to  this  year,  and  I call  on  Jeff  Kurkjlan, 
University  of  Alabama,  Chairman  of  the  S.  S.  Wilks  Memorial  Medal 
Committee  to  discuss  this  year's  committee  work  and  give  the  presentation. 

SAMUEL  S.  WILKS'  MEMORIAL  MEDAL  COMMITTEE: 

MEMBERSHIP,  CHARTER,  SELECTION  PROCEDURE 
Badri g Kurkjlan,  University  of  Alabama 
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The  1977  Committee  was  made  up  of  Badrlg  Kurkjlan,  Chairman,  Francis 
Anscombe,  Jerome  Cornfield,  Cuthbert  Daniel,  Fred  Frlshman,  Frank  6rubbs, 
Joan  R.  Rosenblatt,  and  Herbert  Solomon.  Three  of  these  members  were 
former  employees  of  the  US  Army  with  virtually  career-long  experience 
with  the  Army  Design  Conference.  Three  others  have  considerable 
experience  consulting  with  the  Army  on  technical  problems  and  policy 
matters  associated  with  the  business  of  the  Army  Mathematics  Steering 
Committee.  Moreover,  the  Committee  contained  three  former  Medalists— 
Cuthbert  Daniel,  Frank  Grubbs,  and  Herbert  Solomon. 

One  could  summarize  the  charge  to  the  Committee  by  stating  simply 
that  the  recipient  of  the  Wilks'  Medal  should  be  a person  who  has 
emulated  Sam  Wilks  to  a significant  extent— that  Is,  a scholar,  a 
contributor  to  statistical  methodology  and  one  who  unstlntlngly  devoted 
significant  effort  to  the  public  Interest,  In  particular  the  U.  S.  Army 
Design  Conference  In  Sam's  case. 

Each  year  the  Committee  considers  nominees  from  prior  years  as  wall 
as  those  forwarded  to  the  Committee  from  various  sources  within  the 
statistical  community  In  the  Army  &n<i  elsewhere.  This  year,  the  ballot 
contained  twelve  nominees,  each  of  whom  Is  a nationally,  or  internationally, 
renowed  statistician.  As  might  be  expected  each  year,  the  voting  Is  usually 
very  close  and  two  ballots  are  required  to  select  s,he  recipient.  However, 
this  year  the  Wilks'  Medalist,  Dr.  Churchill  Elsenhart,  Senior  Research 
Fellow,  National  Bureau  of  Standards,  was  the  clear  winner  on  the  first 
oallot.  The  Committee  had  no  difficulty  in  recognizing  Dr.  Elsenhart's 
professional  career  match  with  that  of  Sam  Wilks. 

The  Amy  Design  Conference  was  privileged  to  have  Professor  G.  E.  P. 
Box,  University  of  Wisconsin  and  In-coming  President  of  the  American 
Statistical  Association,  present  Dr.  Elsenhart  with  the  Medal,  the 
official  Citation,  and  a modest  monetary  honorarium. 
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REMARKS  OF  CHURCHILL  EXSEN HART  ON  ACCEPTING 
THE  1977  SAMUEL  S.  WILKS  MEMORIAL  MEDAL 


Chairman  Grubbs,  President-Elect  Box,  Fellow  Statisticians,  Ladles  and 
Gentlemen: 

This  Is  for  me  a very  happy  occasion  as  I express  my  very  great 
pleasure  In  acaeptlng  the  1977  Samuel  S.  Wilks  Memorial  Medal  that 
honors  qy  teacher,  long-time  friend,  and  the  initiator  of  these 
Experiment  Design  Conferences . I especially  appreciate  the  high  honor 
of  being  presented  this  award,  having  served  as  a member  of  the  Wilks 
Memorial  Medal  Committee  of  the  American  Statistical  Association  from 
1965  through  1970. 

I have  spoken  in  great  detail  about  Sam  Wilks  at  two  preceding 
Conferences  of  this  series — the  10th  and  the  20th:  about  his  extensive 
contributions  toward  the  advancement  of  statistical  methods  in  Army 
research,  development  and  testing,  and  about  his  many  other  important 
contributions  in  the  national  interost.  I shall  limit  myself  on  this 
occasion  to  sketching  how  very,  very  helpful  Sam  was  to  me  in  the  early 
stages  of  qy  career.  Generosity  in  helping  others  in  spite  of  his 
own  heavy  schedule  was  one  of  Sam's  outstanding  characteristics. 

I was  Sam's  first  student  in  statistics.  He  arrived  in  Princeton 
in  September  1933  in  time  to  supervise  my  Senior  Theslc  on  "The  Accuracy 
of  Computations  Involving  Quantities  Known  Only  to  a Given  Degree  of 
Approximation".  The  first  part  was  an  attempt  to  present  a fairly 
complete  survey  of  the  accuracy  of  the  general  processes  of  arithmetic 
without  recourse  to  probability  theory  and  the  methods  of  statistics, 
which  were  introduced  and  applied  in  the  second  part. 

Sam  also  supervised  the  preparation  of  my  first  two  publications 
in  statistics.  The  first  was  a short  note  in  the  December  1935  issue 
of  the  Ameriacm  Journal  of  Soienoe  criticising  the  statistical  approach 
employed  in  a paper  appearing  in  the  May  1935  issue — too  harshly,  my 
.geologist  friend,  W.  C.  Krumbein,  says.  The  object  of  the  paper  on 
which  I commented  was  to  suggest  a numerical  measure  of  the  dagrae  of 
"likeness"  of  two  or  more  "heavy  mineral  suites"  with  respect  to  their 
mineral  contents.  The  measure  of  agreement  or  "likeness"  advocated 
was  such  that  the  value  obtained  in  a particular  instance  depended 
upon  the  order  in  which  the  respective  minerals  were  listed:  if  listed 
alphabetically  by  their  names  in  English,  one  value  would  result;  if 
listed  alphabetically  in  soma  othor  language,  a different  value  would 
be  found;  and,  if  in  order  of  their  respective  densities,  still  snother 
value. 
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I suggested  an  approach  via  the  x2  test  of  the  homogeneity  of 
frequency  data  arranged  in  an  r x a table,  and  referenced  R.  A, 

Fisher's  Statietioal  Methods  for  Re  a ear  oh  Workers.  I would  never  have 
had  the  courage  to  submit  this  critical  note  for  publication  had  Sam 
not  been  standing  behind  me  all  the  way. 

My  second  statistical  paper,  "A  Test  for  the  Significance  of 
Lithological  Variations",  published  In  the  December  1935  issue  of  the 
Journal  of  Sedimentary  Petrologyt  was  an  exposition,  for  geologists, 
of  the  x2  test  for  homogeneity,  with  three  worked  examples  utilizing 
data  from  the  paper  discussed  in  the  note.  This  seems  to  have  been 
the  first  exposition  of  x2  methods  in  the  literature  of  geology. 

Several  months  before  those  two  papers  appeared  in  print,  I had 
left  Princeton,  at  Sam's  recommendation,  for  University  College,  London. 

I went  there  to  work  toward  a Ph.D.  in  Statistics  under  Jerzy  Neyman 
and  Egon  S.  Pearson  in  the  Department  of  Statistics.  I also  attended 
the  lectures  that  R.  A.  Fisher  (of  the  Galton  Laboratory  for  National 
Eugenics)  was  giving  on  Experiment  Design  and  on  the  History  of  Biometry; 
and  at  his  request,  prepared  a little  brochure  on  the  use  of  ranked 
normal  deviates  In  the  analysis  of  data  expressed  as  ranks,  for  the 
guidance  of  some  of  Professor  Cyril  Burt's  graduate  students  in  psychology. 
At  the  Annual  Karl  Pearson  Memorial  Dinner  at  University  College  in  the 
spring  of  1959,  Egon  Pearson  Introduced  me  aB  "one  of  the  few  persons 
who  worked  with  Fisher,  Neyman  and  a Pearson  and  managed  to  survive". 

While  I was  at  University  College,  a circumstance  occurred  that 
enabled  me  to  help  Sam  for  a change:  Professor  George  G.  Chambers  of 
the  University  of  Pennsylvania,  had  died  on  24  October  1935,  shortly 
after  his  graduate  course  "Modern  Theory  of  Statistical  Analysis"  got 
underway.  Sam  was  commissioned  to  complete  the  teaching  of  this  course. 

He  wrote  me  a hurried  note  saying  that  he  was  in  dire  need  of  up-to-date 
problems  in  statistical  theory  and  methods  for  the  students  in  his 
new  class.  Would  I please  send  him  some  quickly.  From  time  to  time 
throughout  the  remainder  of  that  academic  year,  I sent  off  to  Sam  a 
bundle  of  homework  and  test  problems  that  we  had  been  given  in  the 
courses  that  I was  taking  under  Neyman,  Pearson,  B.  L.  Welch  and  Fisher. 

Sam's  next  turn  to  help  me  came  in  the  fall  of  1937,  when  I took  up 
my  post  as  Station  Statistician  at  the  Wisconsin  Agricultural  Experiment 
Station.  To  find  one's  self  the  expert  on  statistics  in  a major  research 
organisation  immediately  after  finishing  one's  doctoral  program,  without 
a period  of  "internship"  training  in  applied  work,  with  no  senior  expert 
at  hand  to  consult,  is  a trying  experience — to  be  avoided,  if  possible. 

At  Wisconsin,  however,  I had  the  advantage  that  I did  not  have  to  "sell" 
statistical  methods  to  the  staff  of  the  Experiment  Station.  There  were 
already  on  the  campus  several  agricultural  research  workers  who  had 
taken  courses  in  statistics  under  George  Snedecor  at  Iowa  State  or 
studied  biometry  under  Forrest  Irraner  at  the  Minnesota  Agricultural 
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Experiment  Station.  These  fellows  were  for  the  most  part  quite  self- 
sufficient  in  statistics.  Nonetheless,  they  were  a source  of  difficulty 
for  me:  They  would  bring  me  hard  problems  to  which  the  straight  forward 
procedures  that  they  had  learned  from  Snedecor  or  Immer  did  not  apply. 

I tackled  these  as  best  I could,  and  sent  a draft  to  Sam  In  Princeton 
for  his  approval,  correction,  or  other  counsel.  Only  then  did  I turn 
over  my  "solution"  to  the  "client". 

More  of  a problem  to  me  were  the  members  of  the  Experiment  Station 
staff  who  had  acquired  a smattering  of  statistical  techniques  of 
experiment  design  from  lectures  given  there  a previous  summer  by  Cyril 
H.  Goulden  of  the  University  of  Manitoba.  As  an  admirer  of  Goulden  and 
Ills  writings  I have  not  the  slightest  doubt  that,  what  he  presented  In 
his  lectures  was  entirely  correct;  but  some  of  his  listeners  seem  to 
have  missed  some  of  the  essential  details. 

Thus,  soon  after  my  arrival,  I was  confronted  with  the  results  of  a 
field  trial  of  24  varieties  each  replicated  4 times  in  a 4 x 4 rectangular 
arrangement  of  16  cells  with  6 varieties  l.n  each  cell.  (I  do  not  recall 
the  exact  number  of  varieties  involved,  nor  the  exact  size  of  the 
rectangular  design,  but  the  choices  here  will  serve  to  bring  out  the 
problem  I faced.)  The  disposition  of  the  4 replicates  of  each  variety 
was  such  that  each  variety  occurred  once  and  only  once  in  each  cell-row 
and  each  cell-column. 

1 got  a lot  of  argument  from  ny  consultees  when  I tried  to  convince 
them  that,  In  spite  of  the  last-mentioned  restrictions,  this  arrangement 
was  NOT  a Latin  Square;  could  not  be  analysed  as  such;  that  the  best 
that  could  be  done  would  be  to  do  a Randomized  Blocks  analysis  with 
the  cell-rows  as  "blocks",  and  again  with  the  cell-columns  ae  "blocks", 
and  then  use  whichever  analysis  led  to  the  smaller  residual  mean  square 
for  "error". 

In  view  of  the  considerable  unhappiness  of  the  consultees  at  this  verdict, 
and  being  not  entirely  sure  that  something  better  could  not  be  done,  I 
sent  the  whole  package  off  to  Sam  in  Princeton.  He  replied  by  return 
mall  saying  that  in  this  particular  Instance  I was  entirely  correct, 
inasmuch  as  the  experimenters  had  failed  to  group  the  24  varieties  into 
4 "bundles"  of  6 varieties  each.  Had  they  done  this  and  arranged  the 
4 replicates  of  these  bundles  in  accordance  with  a 4 x 4 Latin  Square, 
they  would  have  had  a Split-Plot  Latin  Square — a design  that  I didn't 
recall  Fisher  having  discussed  In  his  lectures.  They  then  would  have 
been  able  to  do  a regular  Latin  Square  analysis  with  respect  to  the  6 
different  (but  fixed)  "bundles",  leading  to  two  "error"  mean  squares, 
one  appropriate  to  comparing  varieties  In  the  same  bundle,  and  one 
for  comparing  varieties  In  different  bundles. 


1 

The  point  of  all  this  la  that  he  alwayB  took  the  trouble  and  the  I ■ 
tine  to  respond  promptly  and  very  helpfully  by  return  nail" in  this  ■ 
instance  at  a tine  when  he  was  already  enormously  busy  with  his  | 
teaching,  his  work  for  the  Collage  Entrance  Examination  Board  and  his  j 
new  duties  as  Editor  of  the  Annate  of  Mathematical  Statietioa.  \ 

1 could  go  on,  but  1 believe  that  I have  said  enough  to  reveal  ] 
that  Samuel  Stanley  Wilks  was  the  distinguished  mathematical  statist!-  ] 
clan  who  was  my  closest  teacher,  who  launched  me  into  ny  career,  and  j ] 
who  was  also  a wise  and  greatly  loved  friend  and  counselor  from  the  j ] 
moment  of  my  first  meeting  him. 

I shall  cherish  this  medal  bearing  his  name  and  hla  likeness. 
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TH"EE  dimensional  curve  fitting  techniques  to 
EXPRESS  SUPPRESSION  AS  A FUNCTION  OF  RANGE 
AND  ASPECT  ANGLE 


Chaunchy  F.  McKearn  and  David  E.  Brown 
Combat  Developments  Experimentation  Command 
Fort  Ord,  California  93941 


ABSTRACT. 

During  the  2nd  Ouarter  FY  197B,  the  Combat  Developments  Experimen- 
tation Corrmand  will  conduct  the  next  in  a series  of  suppression  exoerl- 
mentSi  Supex  III.  The  primary  objective  of  this  experiment  is  to  deter- 
mine the  probability  of  suppression,  P , as  a function  of  range,  r,  and 
aspect  angle,  0.  Artillery  projectiles  will  be  set  off  In  all  directions 
and  at  varying  ranges  from  the  players,  who  will  be  observing  through  a 
periscope  In  an  uncovered  foxhole.  What  is  needed  is  a surface  fitting 
technique  that  will  permit  the  surface,  P ■ g(r,0)  to  be  determined  from 
the  data  produced.  The  level  curve  for  any  fixed  value  of  P must  be  a 
smooth  curve  which  is  perpendicular  to  the  line  of  observation  at  the  two 
points  at  which  the  curve  Intersects  this  line. 

The  results  of  previous  experiments  indicate  that  P considered  as  a 
function  of  only  offset  distance,  x,  P ■ f(x),  has  an  exponential  or 
logarithmic  form.  These  results  also  indicate  that  the  probability  of 
suppressing,  P , is  not  symmetric  to  the  front  and  rear  of  the  observer. 
The  curve  below  shows  the  general  desired  form  of  a level  curve  for  a 


1.  Location  of  observer.  2.  Direction  of  observations. 

The  difficulty  is  in  arriving  at  the  form  of  an  eouatlon  such  that  any 
curve  for  a fixed  value  of  0,  1.  e.,  P ■ g(r,0  ),  would  be  exponential  or 
logarithmic  and  the  level  curve  for  a fixed  valoe  of  P , 1.  e.,  PQ  * g(r,9), 
is  a closed  curve  with  continuous  derivatives  with  dx/o0  ■ 0 for  9=0 
and-n(to  insure  smoothness  and  vertical  tangents). 
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I.  INTRODUCTION.  Combat  Developments  Experimentation  Command  (CDEC) 
has  conducted  a series  of  suppression  experiments  to  measure  the  proba- 
bility of  suppression,  P , as  a function  of  miss  distance.  Generally, 
the  players  being  suppressed  represented  antitank  guided  missile  gunners 
and  the  suppressive  weapons  Included  both  direct  and  Indirect  fire  weapons 
from  the  M16  rifle  up  to  the  8 in.  Howitzer.  This  report  concerns  only 
the  Indirect  fire  point  detonating  high  explosive  rounds. 

In  order  to  collect  empirical  data  on  the  phenomenon  of  suppression, 
the  subjects  were  placed  In  protective  foxholes  as  shown  in  Figure  1 and 
observed  down  range  through  periscopes.  They  were  task  loaded  by  requir- 
ing them  to  report  the  position  of  a target  tank  In  reference  to  a row  of 
numbered  panels  along  its  path  at  a range  of  1500  meters.  The  gunners 
were  required  to  track  the  target  tank  for  fifteen  consecutive  seconds  to 
receive  credit  for  hitting  the  target  tank. 


The  periscopes  were  Instrumented  in  such  a manner  that  when  they  were 
raised  or  lowered  it  was  automatically  recorded  on  the  central  computer. 

In  addition,  each  periscope  was  electrically  connected  to  a pop-up  silhou- 
ette immediately  In  front  of  the  foxhole  so  that  when  the  periscope  was 
raised  the  pop-up  silhouette  came  up  and  when  the  periscope  was  lowered 
the  silhouette  went  down.  This  pop-up  silhouette  was  within  the  gunner's 
field  of  view  and  represented  the  gunner  in  an  unprotected  position. 

This  assisted  the  subjects  In  perceiving  the  danger  they  would  be  in  If 
they  were  located  at  the  p'op-up  silhouette's  position.  If  a piece  of 
shrapnel  hit  the  silhouette,  a loud  buzzer  was  set  off  in  the  subject's 
foxhole  indicating  that  had  he  been  at  the  pop-up's  location  he  would 
have  been  killed  or  wounded. 

The  artillery  rounds  were  placed  on  the  ground  within  the  player's 
view  at  various  ranges  and  statically  detonated  in  a random  manner.  The 
E data  collected  from  these  tests  Indicated  that  the  probability  of  suppres- 

‘ slon  as  a function  of  miss  distance  could  be  reasonably  well  represented 

by  an  exponential  curve  of  the  form 

Ps  " Aeb\ 

where 

P B probability  of  suppression 
x » distance  between  the  foxhole 
and  the  detonation  point,  and 
A and  b are  curve  fitting  parameters. 

Figure  2 lists  the  curve  parameters  for  the  various  munitions  tested  in 
CDEC's  last  suppression  experiment,  SUPEX  II. 
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EXPONENTIAL  CURVE  PARAMETERS 


WEAPON 

A 

B 

60mm  Mortar 

1.61911 

-.02453 

81  mm  Mortar 

1.50512 

-.01262 

105mm  Howitzer 

1.64851 

-.01306 

105mm.  HEP  - T 

1.70799 

-.01317 

2.75  in  Rocket 

1.77098 

-.01530 

155mm  Howitzer 

3.26843 

-.01773 

8 in  Howitzer 

1 .58806 

-.00450 

FIGURE  2:  EXPONENTIAL  CURVE  PARAMETERS  FOR  EXPRESSING  THE 
PROBABILITY  OF  SUPPRESSION  AS  A FUNCTION  OF 
MISS  DISTANCE. 


II.  EXPERIMENTAL  DESIGN. 

Last  July  CDEC  hosted  a Suppression  Working  Meeting  to  determine  what 
the  next  step  In  CDEC's  Suppression  Program  .sihoul  d be.  Many  of  the  atten- 
dees, mostly  modelers,  expressed  the  concern  that  CDEC's  suppression  data 
only  addressed  suppression  caused  by  detonations  directly  to  the  obser- 
ver's front.  What  was  needed  was  a function  of  range  and  aspect  angle, 

Ps  * g(r ,e) . To  accomplish  this  the  SUPEX  III  experiment  Is  currently 
being  planned  and  Is  scheduled  to  begin  In  April  1978. 


The  range  for  this  experiment  will  be  laid  out  as  shown  In  Figure  3. 
Four  foxholes  will  be  located  at  the  center  of  the  wagon  wheel  with  one 
foxhole  oriented  along  each  of  the  four  principle  axes.  Five  rounds  will 
be  placed  along  each  of  the  twelve  wagon  wheel  spokes  and  set  off  In  a 
random  manner.  When  all  of  the  trials  are  completed  there  will  be  six- 
teen observations  at  each  range  at  all  twelve  aspect  angles. 


III.  STATEMENT  OF  THE  PROBLEM. 

The  data  obtained  from  SUPEX  III  should  permit  the  development  of  a 
three  dimensional  surface  expressing  P«  as  a function  of  range  and  aspect 
angle  similar  to  that  shown  In  Figure  4.  Past  experience  Indicates  that 
for  each  aspect  angle  one  could  expect  the  data  to  fit  a truncated  expo- 
nential and  for  a fixed  value  of  P5  one  should  obtain  a level  curve  that 
Is  somewhat  elliptical  or  egg-shaped  with  continuous  derivatives. 
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One  candidate  function  Is  as  follows: 

Ps  ■ Ae  rb  O-Ycose) 


Ps  ■ probability  of  suppression 
r » miss  distance 
A,b  ■ shaping  constants 
y ■ excentrlclty 
6 ■ aspect  angle 


The  difficulty  with  this  function  Is  that  In  order  for  the  level  curves 
for  a given  value  of  P$  to  assume  the  desired  egg  shape,  yw111  have  to 
be  a function  of « . This  makes  It  difficult  to  determine  all  of  the 
parameters  by  such  conventional  methods  as  least  squares  because  the 
function  Is  no  longer  linear  In  Its  parameters.  What  Is  needed  Is  a me- 
thod for  non-linear  regression  that  can  handle  a function  like  this  or 
a different  function  which  has  the  desired  characteristics  and  Is  linear 
In  Its  parameters. 
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OM  VALIDATING  CRITERION  REFERENCED  TE8T8 


Milton  B.  Malar  and  Staphan  F.  Rirahfald 

US  Any  Raaaarch  Instituta  for  tha  Behavioral 
and  Social  Sdancaa 

INTRODUCTION 


Skill  Qualification  Taata  (SQT)  have  baan  davalopad  to  replaca 
Military  Occupational  Specialty  (MOS)  proficiency  taata  aa  naaauraa  of 
ability  to  parfora  Army  anliatad  Jobs.  SQTa  ara  performance-baaed, 
critarlon-referanced  aaaauraa  of  Job  proficiency,  conalatlns  of  praciaaly 
defined  taata  of  taaka,  all  of  vhlch  ara  critical  and  nacaaaary  to  per- 
formance of  tha  job.  Tha  criterion-referenced  approach  provldea  an  ex- 
plicit relationship  between  job  requirements  and  teat  content  in  chat 
Job  raquireaente  dictate  content  of  SQTa.  Tha  SQT  development  procaaa 
requlrea  that  tests  be  reviewed  by  subject  matter  experts  and  validated 
on  representative  Job  incumbents  to  assure  that  teat  content  is  Job 
relevant.  Teat  standards  of  acceptable  levels  of  performance  are  also 
based  on  job  requirements  and  test  content.  Performance  atandards  ara 
based  on  bahaviorally  derived  absolute  scoring  standards,  and  are  not 
based  on  performance  relative  to  other  soldiere  who  take  the  test.  For 
these  reasons  SQTs  ara  justifiably  viewed  as  criterion-referenced  tests 
of  job  proficiency. 

This  paper  provides  a description  of  the  SQT  program,  its  evolution, 
underlying  assumptions,  requirements,  construction  and  validation  pro- 
cesses, and  methods  of  statistical  analysis.  It  concludes . with  a sst  of 
questions  characterising  soma  of  tha  major  issues  still  under  review. 

Army  training  during  background  in  the  late  1960's  and  early  1970's 
experlanced  a major  revolution.  Performance-based  training  and  testing, 
baaed  on  critical  job  taska  and  criterion-referenced  atandards  of  per- 
formance, ware  being  implemented  in  entry-level  training  courses. 

Training  objectives  were  operationally  defined  by  the  performance  tests 
given  during  tha  course,  and  tha  tests  wars  made  public  to  students  as 
wall  aa  instructors.  Becausa  of  tha  direct  relevance  of  these  tests  to 
the  job,  they  were  capable  of  focusing  Army  training  activities. 

By  maintaining  accountability,  tests  bscoms  affective  instruments 
for  Institutional  change.  Teat  content  helps  Implement  doctrine  about 
the  way  Jobs  are  to  be  performed,  and  la  helpful  in  defining  training 
requirements  and  standards.  The  public  nature  of  the  tests  helps  focus 
attention  on  the  critical  elements  of  the  job  and  enables  effective  use 
of  soldiers'  time  in  preparing  for  tests,  thus  improving  individual 
readiness. 

So  impressive  vae  the  success  of  performance-based  training  and 
taatlng  that  the  Army  made  tha  policy  decision  to  change  from  the  existing 
mode  of  "norm-referenced,  paper-and-pencll  tasting,"  to  the  criterion- 
referenced  mode  of  proficiency  testing.  These  new  criterion-referenced 


testa,  called  Skill  Qualification  Tests  (SQT),  are  having  a profound 
impact  on  the  entire  Army  community.  The  new  testing  procedures  are 
forcing  training  managers,  personnel  managers,  and  research  support 
personnel  to  rethink  and  often  redefine  their  functions. 

REQUIREMENTS  OF  SKILL  QUALIFICATION  TESTS 

The  basic  requirement  of  SQTs  is  that  the  testa  are  job  relevant. 

The  test  content  must  be  baaed  on  job  requirements,  and  the  test  scores 
must  be  accurate  measures  of  ability  to  perform  critical  job  teaks. 

Training  and  Personnel  Management.  SQTs  are  used  by  both  training 
and  personnel  management  to  help  make  important  decisions  affecting  the 
career  development  of  soldiers.  Both  training  and  personnel  management 
need  timely  and  accurate  information  about  how  well  individuals  are 
performing  - training  management  to  determine  training  requirements  of 
individuals,  and  personnel  management  to  help  determine  who  to  promote, 
reclassify,  or  reassign,  Although  hoth  training  and  personnel  management 
have  a need  for  the  same  kind  of  information,  their  immediate  require- 
ments are  not  identical. 

Training  managers  base  their  immediate  training  requirements  on  the 
apeciflc  tasks  performed  in  their  units.  Therefore,  from  this  point  of 
view  relevance  of  the  tests  for  specific  job  assignments  ie  the  primary 
consideration,  and  it  is  defined  in  terms  of  the  tasks  that  soldiers 
perform  in  their  assignments.  The  set  of  tasks  performed  in  an  assign- 
ment is  generally  a subset  of  tasks  required  in  a specialty.  The  task 
is  a convenient  unit  for  determining  training  requirements  because  tasks 
are  observable,  have  initiating  and  terminating  cues,  and  have  standards 
of  performance  that  can  be  reasonably  well  specified.  Decisions  about 
proficiency  can  be  made  at  Che  task  level,  and  training  managers  can 
identify  the  specific  tasks  on  which  soldiers  need  training.  If  the 
test  measures  performance  on  the  specific  tasks  for  which  the  training 
managers  have  responsibility,  then  the  tests  are  serving  their  basic 
purpose. 

Personnel  managers  are  also  concerned  with  the  job  performance  of 
individual  soldiers;  but  rather  than  focusing  on  soldiers'  specific 
assignments,  personnel  managers  need  to  know  how  well  soldiers  can  per- 
form all  the  tasks  in  a specialty.  For  example,  performance  in  a specialty 
such  as  Infantryman  or  Wheeled  Vehicle  Mechanic,  cannot  necessarily  be 
inferred  from  the  set  of  tasks  found  in  any  one  assignment.  Personnel 
managers,  therefore,  have  a need  for  information  based  on  a standard  set 
of  tasks  for  each  specialty.  All  soldiers  in  a specialty  need  to  be 
evaluated  on  the  same  set  of  tasks  to  enable  fair  decisions  about  which 
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soldiers  to  promote,  retain!  or  redeeslfy.  The  need  fore  etenderd  eet 
of  tseka  In  each  specialty  Imposes  additional  testing  requirements  for 
feasibility  and  acceptability.  The  test  scores  should  not  be  affected 
by  when  or  where  the  test  is  taken,  nor  by  whom  it  is  administered  and 
scored.  The  testing  conditions,  as  well  as  performance  standards, 
should  be  standardised, 

The  requirement  for  Army-wide  standardization  at  the  prasent  state 
of  the  art  in  testing  means  that  initially  most  of  the  test  content  la 
in  the  paper-and-pencil  mode  rather  than  hands-on  performance  tests. 

Paper  and  pencil  tests  generally  lack  the  apparent  job  relevance  of 
hands-on  performance  teats,  and  therefore  an  additional  requirement  is 
Imposed  to  assure  that  the  tests  are  acceptable  to  examinees,  supervisors 
and  commanders  as  valid  measures  of  job  proficiency. 

Job  relevance  of  the  testa  is  the  basic  requirement  for  both  training 
and  personnel  management,  even  though  the  definition  of  job  relevance 
may  have  somewhat  different  meanings  for  the  two  purposes.  For  training 
purposes  the  focus  is  on  the  subset  of  tasks  performed  in  the  specific 
job  assignment,  whereas  for  personnel  purposes  the  interest  is  on  the 
entire  set  of  tasks  in  the  specialty. 

Because  of  the  strategic  importance  of  Skill  Qualification  Tests  to 
both  training  and  personnel  management,  high  level  policy  decisions  wars 
made  about  test  content,  validation,  and  scoring.  The  general  require- 
ments of  the  program  are  that  testa  must  be  fair  and  feasible. 

Fairness  and  Feasibility  of  the  Tests.  Fairness  means  that  all 
soldiers  have  an  equal  opportunity  to  demonstrate  their  true  level  of 
job  competence.  Test  content  must  be  based  on  actual  job  requirements, 
and  testing  conditions  must  be  sufficiently  constant  throughout  the  Army 
so  that,  scores  obtained  from  administrations  under  varied  conditions  are 
not  noticeably  differant.  Tests  given  in  Alaska,  Panama,  and  Korea  must 
all  be  administered  under  similar  conditions,  and,  in  addition,  all 
porsons  administering  and  scoring  the  tests  must  be  able  to  do  so  accu- 
rately and  objectively.  An  additional  requirement  is  that  the  tests 
must  be  acceptable  to  soldiers  and  knowledgeable  experts  as  fair  measures 
of  ability  to  perform  critical  job  tasks.  Therefore,  fairness  attends 
to  requirements  of  both  trelnlng  and  personnel  management. 

Feeslblllty  requires  that  the  teste  be  suitable  for  edmlniatrstlon 
in  all  typea  of  unite;  equipment,  terrain,  personnel  and  all  testing 
material  must  be  readily  available.  Another  aspect  of  feasibility  is 
that  testing  time  must  be  reasonable,  with  up  to  one  day  allowed  for 
testing  each  soldier. 
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Form  of  Tasting.  The  requirements  that  Skill  Quellflcetion  Tests 
be  fair  and  feasible  put  severe  limitations  on  the  use  of  hands-on  per- 
formance testa.  The  history  of  performance  testing  ie  that  scoring 
accuracy  and  standardization  are  difficult  to  obtain.  The  resolution  of 
the  fairness  and  feasibility  requirements  is  to  have  several  kinds  of 
testing.  Under  present  policy  decisions!  all  Skill  Qualification  Tests 
contain  a written  component!  and  some  Skill  Qualification  Tests  contain 
a hands-on  component.  Four  hours  of  testing  is  allowed  for  the  written 
component,  and  up  to  four  hours  is  allowed  for  the  hands-on  portion. 

Hands-on  performance  teste  are  moat  desirable.  They  are  a form  of 
structured  observation  where  a scorer  evaluates  an  individual  on  a set 
of  performance  measures  (observable  behaviors) . Advantages  of  hands-on 
testing  are  obvious:  It  tests  actual  performance,  has  high  fidelity  to 
the  job,  allows  for  Immediate  feedback,  and  has  high  face  validity  to 
examinees.  However,  considerable  developmental  effort  ie  required  to 
insure  scoring  reliability  and  standardisation  of  conditions.  It  also 
is  expensive  in  terms  of  equipment,  personnel,  and  time,  i.e.,  feasi- 
bility is  often  a problem.  In  order  to  ensure  feasibility  there  is  a 
natural  tendency  to  truncate  teets  of  tasks  by  shrinking  the  boundaries. 
Unfortunately,  this  may  be  at  the  expense  of  the  validity  of  the  test. 

For  theoe  reasons  it  ie  extremely  difficult,  if  not  impractical,  to 
Initiate  a large  scale  hands-on  testing  system  for  an  organization  as 
large  as  the  Army.  Therefore,  a hands-on  component  constitutes  a subset 
of  an  SQT. 

The  decision  to  include  a written  component  imposes  careful  considers 
tlon  and  analysis  of  what  criterion-referenced  measurement  means  in  this 
context.  Since  the  focus  of  Skill  Qualification  Tests  is  on  ability  to 
perform  critical  job  tasks,  that  aopect  must  be  retained.  Each  written 
test  of  a task  is  to  consist  of  a set  of  items,  where  each  item  is  de- 
signed to  measure  an  essential  behavior  or  step  iu  performing  the  task. 

For  tasks  that  require  primarily  mental  skills,  such  as  the  supply  and 
administration  fields,  written  tests  of  tasks  are  often  similar  to  the 
behaviors  required  on  the  job,  and  the  standards  for  ability  to  perform 
the  test  of  the  tasks  can  be  reasonably  close  to  those  on  the  job.  For 
other  tasks  that  require  psychomotor  skills,  written  test  items  only 
simulate  actual  job  behaviors,  and  the  setting  of  realistic  standards 
indicating  ability  to  perform  the  tasks  is  a more  arbitrary  process.  To 
help  approximate  realistic  job  condltiona,  written  items  may  have  multiple 
correct  responses  and  variable  number  of  alternatives.  This  added 
flexibility  increases  the  difficulty  in  developing  appropriate  methods 
for  setting  standards.  The  determination  of  reasonable  standards  for 
written  teste  of  tasks  is  one  of  the  most  difficult  issues  in  the  SQT 
program. 


Criterion-Referenced  Measurement  of  Task  Performance.  Because  Army 
Jobs  and  training  programs  are  structured  in  terms  of  critical  tasks, 
the  appropriate  level  of  analysis  for  the  SQT  should  also  be  based  on 
tasks.  The  concept  of  "scorable  unit"  was  Invented  to  help  assure 
criterion-referenced  measurement  of  cask  performance.  A scorable  unit 
is  designed  to  measure  ability  to  perform  a specific  task,  or  in  the 
case  of  complex  tasks,  a well  defined  subtask. 

Each  written  scorable  unit  consists  of  a set  of  items,  where  each 
item  is  designed  to  measure  an  essential  behavior  or  step  in  performing 
the  task.  Each  item  is  scored  pass-fall,  and  a prescribed  number  of 
items  must  bo  passed  to  be  GO  on  the  written  scorable  unit.  A GO  is 
counted  as  ability  to  perform  the  cask.  The  current  resolution  to 
setting  standards  for  written  scorable  units  is  to  require  that  an  a 
priori  number  of  items  be  passed.  For  example,  if  a scorable  unit 
contains  five  items,  then  four  must  be  passed  to  obtain  a GO. 

Hands-on  scorable  units  consist  of  a set  of  performance  measures, 
where  each  performance  measure  is  scored  pass-fail,  and  a prescribed 
number  of  performance  moaaures  must  be  passed  to  be  GO  on  the  scorable 
unit.  A GO  on  the  scorable  unit  Is  interpreted  as  ability  to  perform 
the  task.  The  standards  of  GO  generally  are  comparable  to  what  is 
required  on  the  job. 

The  requirement  that  all  scorable  units  be  acceptable  as  fair 
measures  of  ability  to  perform  tasks  is  applied  to  both  the  hands-on  and 
written  teats.  Juries  of  experts  must  agree  that  the  written  items  and 
hands-on  performance  measures  reflect  ebllity  to  perform  the  tasks. 
Perhaps  a safer  statement  would  be  that  failure  to  pass  the  items  indi- 
cates that  the  person  is  not  able  to  perform  the  task. 

Establishing  a Correspondence  Between  Test  Content  and  Job  Tasks. 

The  most  critical  requirement  of  SQTs  is  their  job  relevance.  The  pro- 
cedures for  establishing  job  relevance  are  described  in  this  section. 

Test  content  of  all  SQTs  is  a sample  of  critical  tasks  from  the  domain 
of  job  tasks  in  the  specialty.  In  this  way  the  tests  have  a specifiable 
and  explicit  link  to  the  Job.  For  each  Army  job  there  exists  a Soldier's 
Manual  that  lists  the  tasks  for  which  a soldier  in  that  specialty  is 
responsible.  Therefore,  this  set  of  tasks  becomes  the  operational 
definition  of  the  job.  Tests  to  measure  performance  on  specific  Job 
tasks  listed  in  the  Soldier's  Manual  are  developed  from  appropriate  task 
analyses,  and  the  testa  for  each  task  are  operational  definitions  of 
performance  on  the  tasks.  Performance  on  the  Individual  tasks  is  summed 
to  obtain  a total  score,  which  in  turn  serves  as  the  operational  defini- 
tion of  job  competence.  Modern  instructional  technology,  with  its 


emphasis  on  specification  of  objectives  and  verification  that  those  ob- 
jectives are  attained,  supports  the  above  process  for  establishing  the 
content  and  focus  of  SQTs,  and  thereby  lends  added  credibility  to  these 
procedures. 

Though  the  task  is  the  basic  level  of  analysis,  the  validity  of 
task  proficiency  measurement  depends  on  the  adequacy  of  the  test  of  the 
task.  By  means  of  detailed  teak  analyses,  the  set  of  performance  measures 
or  behaviors  required  for  successful  performance  of  the  task  are  Identi- 
fied. These  lists  of  performance  measures  are  all  available  in  the  Sol- 
dier's Manual.  Each  item  developed  to  teat  for  task  proficiency  must 
occupy  a clearly  specified  relationship  to  a performance  measure  required 
in  task  performance.  Assuming  that  the  set  of  items  developed  for  a test 
of  a task  has  been  selected  in  accordance  with  the  procedures  described 
above,  one  may  assume  with  reasonably  high  confidence  that  successful 
performance  of  each  tested  behavior  is  a necessary  condition  for  success- 
ful performance  of  the  task.  How  to  score  the  set  of  items  in  a written 
scorable  unit  to  obtain  estimates  of  ability  to  perform  tasks  is  a complex 
question.  Measurement  error  is  always  a problem  that  must  be  allowed  for. 
Whether  being  scored  GO  on  a test  of  a task  requires  passing  all  items 
Included  in  the  test  of  the  task,  or  soma  number  lass  than  perfection,  de- 
pends on  the  nature  of  the  task,  the  fidelity  with  which  the  task  can  be 
tested  in  a written  mode,  the  complexity  of  the  format  (a.g.  multiple  cor- 
rect responses),  and  the  number  of  items  within  the  cluster.  Use  of  sub- 
ject matter  experts  in  reaching  such  a determination  is  mandatory. 

In  the  case  of  a hands-on  test  of  a task,  measurement  error  arising 
from  the  use  of  words  is  minimised.  However,  other  measurement  problems 
arise.  One  is  that  a full  performance  tost  of  a task  generally  is  not 
feasible.  It  may  be  too  costly  in  terms  of  time,  equipment,  and  personnel. 
Therefore,  a truncated  test  of  the  task  is  often  developed  by  eliminating 
some  of  the  performance  measures  or  steps  required  for  the  full  performance 
test.  By  truncating  the  test,  though,  it  is  possible  that  the  tested  por- 
tion is  necessary  to  successful  task  performance,  but  is  not  sufficient. 

Validate  Tests  Prior  to  Administration.  A first  question  to  be  re- 
solved is  how  to  define  validity.  The  starting  point  is  the  usual  defini- 
tion of  validity,  l.e.,  that  the  tests  measure  what  they  are  intended  to 
measure.  In  the  case  of  Skill  Qualification  Tests,  the  Intent  is  to  mea- 
sure ability  to  perform  critical  job  tasks.  The  content  of  the  tests, 
therefore,  becomes  the  crucial  factor  in  establishing  validity.  The  con- 
tent must  be  thoroughly  reviewed  by  experts  to  snsure  that  the  right 
behaviors  and  decisions  are  assembled  in  each  scorable  unit.  The  first 
requirement,  then,  is  consistent  agreement  among  experts  that  the  content 
of  the  test  is  based  on  ability  to  perform  critical  job  tasks.  A 
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second  requirement  is  chat  the  scorable  units  discriminate  between  per- 
formers (masters)  and  nonperformers  (nonmaaters) . A third  requirement 
applies  only  to  written  scorable  units.  All  items  in  a written  scorable 
unit  must  be  consistent  estimators  of  mastery  on  the  task  covered  by  the 
entire  scorable  unit.  Thus,  the  conceptualising  of  validity  focuses  on 
consistency:  Consistency  between  the  content  of  the  test  and  the  job 
tasks,  consistency  among  expert  reviews,  end  consistency  in  Identifying 
mastery. 


DEVELOPMENT  PROCESS 


Skill  Qualification  Testa  are  constructed  and  validated  by  Army 
agencies  that  have  resident  expertise  in  the  job  specialties.  Generally 
these  are  the  Army  schools,  but  they  also  include  other  agencies,  such 
as  the  Health  Services  Command.  Since  the  test  contsnt  must  rsflect  job 
tasks,  the  test  developers  must  havs  detailed  task  analyses  available 
that  identify  the  behaviors  sssentlal  to  successful  performance  of  the 
tasks. 

The  development  process  for  Skill  Qualification  Tests  may  be  concep- 
tualised in  four  stepa: 

1.  Identify  job  tasks  for  testing;  thsse  tasks  require  special 
training  or  are  frequently  failed. 

2.  Identify  behavior*  or  steps  essential  for  performing  each  teak; 
the  intent  is  to  identify  the  steps  that  are  necessary  and  sufficient 
for  successful  task  performance. 

3.  Develop  scorable  unite  (tests  of  teaks)  to  measure  essential 
behaviors  for  the  tasks;  itsms  in  scorable  units  must  hava  explicit 
relationship  to  task  steps,  and  the  scorable  unit  as  a whole  must  cor- 
respond to  performance  of  the  cask;  items  ere  scored  pass-fail  (1  or  0) , 
and  scorable  units  are  scored  GO/NO-GO  (also  1 or  0)  to  reflect  mastery 
or  nonmastery  of  the  task  according  to  the  prescribed  standards;  the 
number  of  scorable  units  scored  GO  is  a measure  of  job  proficiency. 

Concent  of  the  Skill  Qualification  Teats  is  fixed  after  these  three 
steps  are  complstsd.  Experts  review  (a)  the  tasks  selected  for  testing 
to  make  sure  they  are  critical  to  the  job;  (b)  the  behavior*  required  to 
perform  the  task  to  make  eure  they  are  necessary  end  sufficient;  and  (c) 
the  scorable  unit  to  make  sura  that  the  items  correspond  to  the  behaviors. 
After  the  experts  agree  on  the  appropriateness  of  the  test  to  Job  require- 
ments, the  test  content  cannot  be  changed. 
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4.  Try  out  scorable  units  on  soldiers. 

This  stop  itrvn  only  to  establish  the  measurement  properties  of 
the  testa.  Items  found  to  be  unsatisfactory  through  the  tryout  can  be 
revised,  as  long  as  the  test  content  is  not  changed. 

STATISTICAL  ANALYSIS  OF  TRYOUT  DATA 

The  tryout  step  was  originally  conceived  of  as  the  validation  of 
Skill  Qualification  Tests,  and  the  earlier  steps  as  test  construction. 
Experience  gained  during  the  past  two  years,  however,  has  shown  that  for 
criterion-referenced  tests,  validation  encompasses  the  entire  develop- 
ment process. 

The  guiding  principle  of  the  developmental  process  is  consistency 
of  measurement.  Experts  must  agree  on  the  relevance  of  the  test  con- 
tent to  job  requirements  and  the  appropriateness  of  teste  items  to  task 
behaviors.  In  the  tryout  on  soldiers,  the  scorable  units  must  be  con- 
sistent indicators  of  ability  to  perform  the  task.  For  written  scorable 
units,  each  item  in  a scorable  unit  is  first  correlated  with  an  inde- 
pendent estimate  of  ability  to  perform  the  task,  and  then  with  the  other 
items  in  the  scorable  unit.  The  external  estimates  of  ability  to  perform 
the  task  are  self-ratings  obtained  through  standard  questions.  Up  to  30 
soldiers  are  Included  in  the  sample  to  determine  consistency  of  measure- 
ment for  each  scorable  unit.  The  analysis  consists  of  computing  an 
Agreement  Index  for  each  item  and  scorable  unit: 

Self-rating 


Performer  Nonperformer 


Item 

Pass 

or 

or 

Scorable 

GO 

Unit 

Fall 

or 

NO-GO 

a,b,c,  and  d are  cell  frequencies 

Agreement  Index  - ad  - be;  if  Agreement  Index ^ 0,  then  the  item  or 
scorable  unit  is  satisfactory;  if  Agreement  Index  ^ 0,  then  the  item  or 
scorable  unit  is  unsatisfactory,  and  must  be  examined  for  revision. 
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A second  analysis  involves  examining  patterns  of  Agreement:  Indices 
for  items  in  a scorable  unit.  Items  that  have  positive  Agreement  In- 
dices are  satisfactory,  and  items  with  negative  Agreement  Indices  must 
be  examined  for  revision. 

SQT  ISSUES  STILL  UNDER  REVIEW 

1.  Is  the  Agreement  Index  an  appropriate  statistic  to  evaluate  the 
quality  of  written  items  and  scorable  units? 

2.  For  written  scorable  units,  standards  of  performance  are  set 
arbitrarily,  e.g.,  3 of  4 items  muBt  be  passed  to  be  GO  on  a scorable 
unit.  Are  there  statistical  techniques  to  indicate  level  of  mastery 
that  can  be  readily  employed  by  test  developers  who  are  not  trained  in 
statistics? 

3.  Are  there  alternative  procedures  for  collecting  and  analyzing 
data  on  the  satisfactoriness  of  written  items  and  scorable  units,  which 
are  also  sensitive  to  the  requirement  of  fixed  test  content? 

4.  Are  there  more  appropriate  ways  of  combining  Bcorcn  from  items 
and  scorable  units  into  a total  test  score  that  indicates  level  of  Job 
proficiency? 


ANALYSIS  OF  MAN-MACHINE  INTERFACE  INFORMATION 
IN  CURRENT  COMMUNICATIONS  SYSTEMS 


R.  J.  D'Accardi  and  H.  S.  Bennett,  US  Army  Communications 

Research  and  Development  Command,  Fort  Monmouth,  New  Jersey 

C.  F.  Tsokos,  Department  of  Mathematics,  University  of  South 

Florida,  Tampa,  Florida 

ABSTRACT.  Experiments  dealing  with  man-machine  Interface  problems  occurring 
in  tactical  communications  systems  have  been  conducted  at  Ft.  Monmouth,  NJ. 
The  thrust  of  the  study  was  to  characterize  the  human  element  of  a sophisti- 
cated system  by  varying  the  environmental  factors  of  ambient  light  and 
acoustic  noise  and  observing  quantitative  changes  in  operator  performance. 
Specifically,  the  number  of  errors  committed  by  a communications  systems 
operator  were  observed  as  a function  of  the  environmental  factors.  The 
equipments  used  were  the  standard  teletypewriter  terminal  and  an  optical 
display  terminal. 

The  object  of  this  presentation  is  threefold:  First,  we  discuss  the 
importance  of  human-factors  in  system  development  and  briefly  review  the 
experimental  design.  Secondly,  we  present  a nun. linear  regression  model 
and  error  matrices  which  can  be  used  to  pred’N.t  operator  performance  as  a 
function  of  the  environmental  factors  of  ambient  light  and  acoustic  noise, 
end  thirdly,  time  series  models  are  presented  for  the  optical  display 
terminal  to  illustrate  the  usefulness  of  characterizing,  within  reason, 
the  error  performance  of  a terminal  operator  working  in  a wide  variety  of 
environments. 

I.  THE  IMPORTANCE  OF  HUMAN  FACTORS  IN  SYSTEM  DEVELOPMENT. 

It  is  interesting  to  note  that  Human  Factors  studies  in  the  Army  can 
be  traced  back  to  World  War  I.  It  was  found  at  that  time  that  in  the 
fledgling  British  Air  Service  90%  of  all  fatal  accidents  were  the  result 
of  individual  pilot  deficiencies  and  only  2%  were  killed  in  combat  (the 
remaining  8%  were  due  to  materiel  deficiencies).  This  fact  led  the  US 
Army  to  establish  a laboratory  designed  to  study  problems  (Including  the 
human  factors  aspects)  connected  with  flying.  It  was  called  the  Research 
Board  of  the  Army  Signal  Corps  Air  Service  and  was  established  In 
October  1917,  It  was  quickly  followed  by  the  School  of  Aviation  Medicine 
In  1918  (now  the  School  of  Aerospace  Medicine  at  Brooks  Air  Force  Base, 
Texas)  and  the  Physiological  Research  Laboratory  (now  the  Aerospace 
Medical  Research  Laboratories  at  the  Wright  Patterson  Air  Force  Base). 

By  the  time  World  War  II  began  the  human  factors  field  has  been  taken 
over  by  industrial  engineers  and  industrial  psychologists  (e.g.  Taylor, 
Gantt,  and  the  Gilbreths).  It  was  World  War  II,  however,  with  its 
quantum  jump  in  the  technological  complexity  of  man/machine  systems, 
which  set  the  mold  and  pattern  for  modern  present  day  human  factors 
engineering. 


The  mission  of  modern  day  studies  of  man/machine  environmental  factors 
has  five  aspects: 

a)  It  Is  connected  with  the  contributions  of  the  man/machine 
Interface  to  the  entire  or  over-all  performance  of  the  system.  As  the 
systems  become  more  complex,  they  also  become  more  vulnerable  to  catas- 
trophic failures  (shades  of  the  power  blackouts!)  and  the  man/machine 
Interface  Is  a critically  vulnerable  point  In  such  systems. 

b)  It  must  be  concerned  with  the  translation  of  broad  system 
operational  requirements  Into  specific  man/machine  Interface  functional 
requirements.  For  example,  how  do  you  "get  down  to  cases"  In  an  air 
defense  vigilance  task  with  operator  requirements  when  all  you  know  Is 
that  a "bogle"  must  not  get  through  even  though  It  may  occur  only  once 
In  a 24  hr.  day. 

c)  The  human  factors  engineer  must  be  Involved  In  the  promulga- 
tion of  training  and  personnel  selection  criteria.  If,  In  a complex 
system,  the  status  of  the  man/machine  Interface  Is  critical  to  the  over- 
all system  performance  then  the  qualifications,  job  description,  and 
needed  skills  for  the  human  elements  of  the  system  (Including  maintenance 
as  well  as  operation)  must  be  a major  duty  of  human  factors  engineering. 

d)  As  most  modern  complex  systems  are  relatively  costly,  It 
behooves  the  human  factors  and  systems  engineer  to  model,  whenever 
possible,  the  system  under  consideration.  Such  models  must  be  flexible 
enough  to  Incorporate  a realistic  (and  usually  non-ergodlc)  representa- 
tion of  the  man/machine  Interfaces.  Analyses  of  data  secured  from  these 
models  also  Is  the  concern  of  the  human  factors  engineer. 

e)  Finally,  although  modeling  may  be  the  norm  for  analysis  of 
complex  systems,  the  human  factors  engineer  must  never  lose  touch  with 
the  real  world.  Therefore,  whenever  feasible,  he  should  be  Involved  In 
actual  system  performance  tests  and  In  the  analyses  of  the  resulting  data. 

The  work  being  reported  on  In  this  paper  Is  In  line  with  several  of 
the  above  listed  missions,  and.  In  particular,  the  one  under  subparagraph 
"d"  above.  Before  considering  the  details  of  the  research,  one  should 
consider  the  environment  In  which  the  Interface  under  study  Is  Immersed. 
This  environment  Is  described  as  a hlerarchal  command/control  system.  A 
generalized  C«  system  model  must  make  provision  for  sensing,  filtering, 
analysis,  decision  making,  and  feedback  at  each  level  In  the  hierarchy  of 
command.  However,  since  each  level  In  the  hierarchy  must  feed  Information 
upwards  In  the  chain  of  command  and  effector-action  commands  downward,  the 
resultant  loops  are  Imbedded  In  a hlerarchal  fashion. 
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Let  us  trace  one  such  loop.  Imagine  a line  or  field  of  sensors  (REM- 
BASS)  at  the  FEBA  Intended  to  warn  of  enetny  overload  approach.  In  addition, 
let  us  visualize  airborne  reconnaissance  (drone  and  manned),  behind  the 
lines  intelligence  operations,  prisoner  interrogation,  signal  Intercept 
operations  and  the  like.  All  of  this  "sensor"  Information  must  be  filtered, 
classified,  and  appropriately  analyzed  and  correlated  for  presentation  to 
a commander  for  decision  as  to  appropriate  effector  action  (retreat, 
advance,  hold,  encircle,  etc.).  Once  the  effector  action  Is  ordered,  the 
resulting  movements  and  actions  must  be  reported  through  the  original 
Information  gathering  network,  as  well  as  through  command  channel  status 
reports,  so  that  further  or  modified  effector  action  may  result.  Thus, 
we  have  a reentrant  feedback  loop  continuously  in  action.  If  we  visualize 
this  situation  (sensing,  decision,  effector  action,  feedback)  as  occurlng 
at  least  at  each  level  of  command  (company,  battalion,  division,  corps), 
then  the  significance  of  the  imbedded  or  hlerarchal  nature  of  the  multiple 
feedback  loops  becomes  evident. 

How  does  the  particular  man/machine  Interface  being  reported  upon  in 
the  paper  fit  Into  the  above?  At  almost  every  stage  of  Information  flow 
there  Is  a point  where  multiple  channels  of  Information  must  be  consoli- 
dated and  summarized  so  as  to  form  a new  message.  A common  denominator  at 
these  points  Is  the  message  center,  and  In  particular,  field  or  forward 
area  message  centers.  The  operators  In  such  message  centers  operate  under 
a combination  of  stressful  environmental  factors  — acoustic  noise,  poor 
light,  fear  of  bodily  harm,  etc.  The  subject  study  attempts  to  simulate 
under  controlled  conditions  the  first  two  factors  and  to  substitute  for 
fear  of  bodily  harm  a fear-of- failure  situation  by  giving  the  operators 
who  are  taking  part  In  the  simulation  a series  of  tasks  which  are  greater 
In  amount  than  the  time  allotted  for  tholr  accomplishment. 

This  is  the  general  scenario  and  motivation  for  the  study.  Now  let  us 
proceed  with  a discussion  of  the  results  which  were  tc  be  realized  from 
the  data  gathered.  Since  In  a simulation  one  cannot  hope  to  achieve  all 
the  detailed  conditions  possible,  It  was  the  purpose  of  this  study  to  come 
up  with  a predictor  model  which  would  allow  for  Insertion  of  other  permuta- 
tions and  combinations  of  the  considered  conditions  and  then  to  predict 
operator  performance  under  these  new  conditions, 

II.  DESIGN  QF  THE  EXPERIMENT. 

The  details  of  the  experimental  design  were  reported  In  the  Proceedings 
of  the  Twenty-first  Conference  on  the  Design  of  Experiments  In  Army  Research 
Development  and  Testing,  ARO  Report  76-2,  pp  13-29,  May  1976.  What  follows 
In  this  section  Is  a general  summary  of  the  experiment. 


The  significance  of  acoustic  noise  and  ambient  light  on  operator 
performance  was  Investigated  using  both  an  optical  display  transmission 
device,  and  a standard  teletypewriter.  Primarily,  the  visual  display 
terminal  Is  a developmental  equipment  Intended  to  visually  present 
messages  on  a CRT  display  where  an  operator  can  see  and  correct  his  j 

message  prior  to  transmission.  j 

The  experiment  consisted  of  testing  the  transcription  accuracy  of  six  S' 

experienced  conrnunlcatlons-center  operators  under  16  different  combinations  j 

of  environment.  Ambient,  light  was  varied  at  four  levels,  ranging  from 
24  ft-candles  to  3 ft-candles,  and  acoustic  noise  was  concurrently  varied  1 

at  four  sound-pressure  levels  ranging  from  55  dBa  to  95  dBa.  The  55  dBa  j 

level  was  considered  the  quiet  condition  and  the  95  dBa  level  represented 
an  extremely  annoying  and  distracting  "pink"  noise.  The  chosen  ambient  j 

light  levels  of  24,  12,  6,  and  3 ft-candles,  respectively,  represented  i 

successively  deteriorating  lighting  conditions. 

The  messages  for  the  experiment  consisted  of  forty  random-letter  word  j 

groups  of  five  characters  each.  They  were  derived  through  a random  number  j', 

generator  and  an  alpha-numeric  conversion.  No  message  was  a duplicate  nor  < 

were  they  duplicated  by  any  of  the  operators  on  either  terminal  equipment.  1 

The  aim  of  the  experiment  was  to  vary  the  environmental  variables  and  to  J 

observe  the  transcription  accuracy  of  each  operator  utilizing  the  visual  i 

display  terminal  as  a function  of  time.  The  response  variable,  accuracy 
(number  of  committed  errors),  was  the  measure  of  transcription  errors  that 
each  operator  committed  per  four  second  Interval.  The  results  were  com- 
pared to  an  acceptable  operator  norm,  l.e.,  typing  a message  format  on  a 
standard  teletype  terminal  under  the  same  conditions.  Each  operator  was 
tested  In  four  sessions,  each  session  programmed  for  eight  random  environ- 
mental combinations,  four  for  each  terminal  equipment.  See  Table  1.  The 
tests  were  alternated  between  the  optical  display  unit  and  the  standard 
teletypewriter  to  reduce  the  effects  of  learning.  A thirty  minute 
familiarization  period  was  given  each  operator  prior  to  the  tests. 
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TABLE  1 

TREATMENT  SCHEDULE  PER  OPERATOR 


Environmental  Treatment* 
Combinations 


Session 


Optical  Display 
Terminal 


Teletype 

Terminal 


‘'Treatment  • (Ambient  Light  Level,  Acoustic  Noise  Level) 


Ambient  Light 
Level  Value 


24  ft-candles 
12  ft-candles 
6 ft-candles 
3 ft-candles 


Acoustic  Noise 
Level  Value 

1 55  dBa 

2 70  dBa 

3 80  dBa 

4 95  dBa 


III.  A NON-LINEAR  REGRESSION  MODEL  FOR  MAN-MACHINE  INTERFACE. 


In  this  section  an  acceptable  model  to  predict  operator  performance 
Is  presented  so  that  one  can  determine  the  environmental  combination  of 
ambient  light  and  acoustic  noise  which  generally  causes  a minimum  number 
of  committed  errors.  Various  linear,  multiple  linear,  and  non-linear 
models  were  tested  for  both  terminals.  The  criterion  used  for  choosing 
the  best  model  was  the  minimum  SSE  (sum  Of  squares  for  error)  where 

SSE  - 2 (Y-  - ?4)a. 

1-1 

and  Yj  ■ observed  errors, 

A 

Y'j  ■ predicted  errors. 

The  general  model  that  best  describes  the  observed  data  Is  of  the 

form: 

Y ■ B°  + + Sj^j  + Bixxx*  + 

+ S„X2  + 81X1X2  + 87X1X2  + B,x*X2  + BiX*  + 810X2  + ej 

where  Y - average  number  of  errors  (operator  performance)  per 
cell, 

X,  ■ ambient  light  level, 

Xa  ■ acoustic  noise  level, 

8^  ■ model  coefficients,  1 * 0,1  ••••10, 

ej  ■ experimental  error,  j ■ l,  ••••n»  (the  extent  to  which 
J the  observed  data  and  the  model  disagree,  where  ej's 
are  Independent  and  e 'v  N(0,  ct2 I ) ) , and 

n * 16. 

The  estimated  values  of  the  coefficients,  error  variance,  correlation 
and  appropriate  F statistic  for  both  terminals  «<e  .jmmarlzed  In  the 
following  table: 


Parameter 


Optical  Display 
Terminal 


Tel e typewriter 
Terminal 


00 

34.7500 

-7*793 

0! 

.5092 

-6.365 

02 

-1.0840 

1.018 

0s 

- .0399 

.1588 

04 

.0359 

.1663 

0« 

.0137 

- .02055 

0. 

.0002373 

- .0007769 

07 

.001990 

- .004906 

0. 

-.000011 

.00002257 

09 

.003293 

.001425 

010 

.000053 

.0001133 

SSE 

5.136 

3.389 

S2 

1.027 

.6779 

F (MODEL) 

2.735 

6.536 

pi  a 

K yy 

.8455 

.9289 

Ryy 

.9195 

.9638 

In  the  case  of  the  optical  display  terminal,  the  F statistic 
Indicates  a possible  overabundance  of  variables.  In  the  case  of  the 
teletypewriter  terminal,  the  small  SSE,  large  R^O,  and  relatively 
small  F statistic  Indicate  an  acceptable  model. 

Now,  we  begin  to  Investigate  the  possibility  of  eliminating  those 
variables  that  do  not  significantly  contribute  to  the  dependent  variable. 
The  procedure  used  to  form  the  reduced  models  was  the  “forward  selection 
procedure"  which  begins  with  the  variable  Xj,  that  has  the  highest 
correlation  pXiy  with  y.  Next,  the  partial  correlation  coefficients  of 
the  remaining  xj  and  y,  p(xjy|xj),  j M.  are  calculated.  The  Xj  with 
the  greatest  p()|iy|x{)  Is  selected  to  enter  the  regression  equation. 

This  process  Is  continued,  and  as  each  variable  Is  entered  Into  the 
equation,  the  multiple  correlation  coefficient  R2yC  and  the  partial  F 
test  value  for  the  most  recent  entry  are  examined.  In  the  first  case, 
one  checks  to  assure  a relatively  insignificant  change  In  R2yy,  and, 
secondly,  whether  or  not  the  Inserted  variable  has  taken  up  a significant 
amount  of  variation  over  the  previous  variables  In  the  regression  model. 
When  the  partial  F test  becomes  Insignificant  (the  SSE  Is  sufficiently 
reduced)  and  R2yy  Is  not  very  different  from  the  "full  model",  the  pro- 
cess Is  terminated.  The  reduced  model,  therefore,  contains  all  signifi- 
cant variables  plus  the  first  two  Insignificant  variables  to  accomodate 
any  error  due  to  the  estimates. 

Based  on  the  general  model  previously  stated  the  appropriate 
reduced  models  which  characterize  operator  performance  for  both  terminals 
are  as  follows: 

1)  for  the  optical  display  terminal: 

Y ■ 0j  + + e*Yi + + e 


where:  8j 


with  SSE 


(MODEL) 


10.63, 

-0.1239, 

0.000028, 

-0.0002202, 

0,000008367, 

8.678, 

0.7889, 

7.783, 

0.7389, 

0.8S96 
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11)  for  the  tel etypewrl ter  terminal: 

y ■ + e*Xi  + eixiXa  + b,1x!x2  + Bixi  + Bjxl  + e 

where:  3*  - 3.211, 

0 

0}  - -1.365, 

0»  - 0.03532, 

2 

0»  ■ -0.001288, 

3*  - 0.002123, 

0»  - -0.000004273, 

5 

with:  SSE  ■ 7.63, 

S*  . 0.763, 
e 

F(M0DEL)  " 10*5, 

R2yJ-  °-04* 

RyJ  - 0.9165. 

The  reduced  models  now  provide  the  capability  to  predict  the  number  of 
transcribed  errors  given  the  desired  combination  of  ambient  light. and 
acoustic  noise. 

IV.  OPTIMAL  LIGHT  AND  SOUND  LEVELS. 


One  can  now  attempt  to  find  the  light-sound  combination  that  causes 
the  least  number  of  errors  to  be  committed.  The  method  used  was  simply 
to  evaluate  the  predicted  value  of  Y for  ordered  pairs,  (Xj,  X2),  where 
Xi  assumes  all  Integer  values  from  1 to  26,  and  X2  assumes  even  Integers 
values  from  50  to  100.  These  ranges  of  Xi  and  Xg  where  chosen  based  upon 
the  levels  of  Xi  and  Xg  used  In  the  experiment.  Thus,  the  reduced  models 
are  used  to  provide  a reasonable  extrapolation  outside  the  tested  environ- 
mental limits. 

The  predicted  Y values,  l.e.,  the  predicted  number  of  errors,  were 
calculated  for  the  environmental  combinations  described  in  section  II  for 
the  optical  display  data  (using  the  reduced  model)  to  obtain  the  matrix  of 
table  2.  Visual  examination  of  this  matrix  shows  that  the  minimum  number 
of  error?,  l.e,,  4.4,  will  occur  at  a light  level  of  24  ft-candles  and  a 
concurrent  acoustic  noise  level  of  54  dBa,  or,  If  we  are  willing  to 
extrapolate  slightly  outside  the  region  from  which  data  has  been  obtained, 
the  absolute  minimum,  3.8,  occurs  at  26  ft-candles  and  50  dBa.  Thus,  one 
can  conclude  that  the  minimum  number  of  errors  committed  on  the  optical 
terminal  (In  the  region  for  which  data’ was  taken)  occurs  at  the  minimum 
sound  and  maximum  light  combinations,  that  Is,  26  ft-candles/55  dBa. 


A similar  matrix  of  predicted  errors  was  computed  for  the  reduced 
teletypewriter  model,  and  Is  shown  In  table  3.  In  this  case,  visual 
examination  shows  that  the  minimum  number  of  predicted  errors  occur  at 
a light  level  of  about  16-17  ft-candles  and  at  a concurrent  sound  level 
of  about  55  d6a.  In  both  cases  (optical  display  and  teletypewriter)  the 
results  of  the  minima  were  expected.  It  Is  to  be  noted,  however,  that  In 
a tactical  situation  the  environmental  factors  of  ambient  light  and 
acoustic  noise  are  far  from  optimal.  Thus,  one  ca,n  conclude  from  the 
matrices  that  for  a wide  variety  of  the  environmental  factors  Xj  and  X?, 
one  can  predict  how  well  experienced  communicators  will  perform. 


Table  3.  predicted  error  perfohwrce  for  various  levels  of  ahbi 

LIGHT  MO  ACOUSTIC  BOISE  FOR  THE  TELETYPEWRITER  TERHIR 


256 


V.  TIME-SERIES  MODELING  OF  MAN/MACHINE  INTERFACES. 


n 

1 1 
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The  best  non-linear  regression  model  presented  In  the  previous  section 
dealt  with  the  prediction  of  the  number  of  committed  errors  as  a function 
of  two  environmental  variables,  namely,  light  and  sound.  More  often,  the 
communications  engineer  Is  Interested  In  such  factors  as  performance,  and 
efficiency  as  a function  of  time.  Thus,  utilizing  time-series  models,  it 
may  be  possible  to  characterize  a group  of  operators  either  singly  or  as  a 
whole  for  predicting  the  number  of  committed  errors  at  times  tj,  to.  to, 
*•••»  tn.  In  the  future.  The  time-series  approach  for  this  type  of  Infor- 
mation is  somewhat  unique  In  that  not  many  attempts  have  been  made  to 
Implement  this  methodology  In  analyzing  time-dependent  man-machine  Inter- 
face data.  In  view  of  this  uniqueness,  there  are  a number  of  shortcomings 
that  were  experienced.  One  of  the  most  serious  limitations  was  the  sample 
size.  However,  enough  Information  Is  available  so  that  one  can  Initiate 
the  time-series  methodology  Into  this  particular  subject  area.  This 
approach  is  extremely  useful  because  it  characterizes,  within  reason,  the 
error  performance  of  any  communications  terminal  equipment  operator 
working  In  wide  variety  of  environments. 

Incorporated  Into  the  design  of  the  experiment  was  a four-second  time 
Interval  counter.  This  provided  a running  count  of  the  number  of  trans- 
cribed errors  In  each  four-second  time  period  for  the  duration  of  the  test. 
Thus,  thirty-two  non-determlni Stic  time-series  were  created  (sixteen  per 
terminal,  one  corresponding  to  each  combination  of  environmental  factors). 

Of  the  time  series  so  obtained  the  two  most  critical  environmental  com- 
binations are  presented,  namely,  (1,4)  and  (4,4)  (refer  to  Section  II). 
Criticality  was  determined  by  the  degree  of  non-statlonarlty  of  the  series, 
or  In  other  words,  the  amount  of  filtering  required  to  bring  the  process 
Into  statistical  equilibrium. 

Clearly,  the  time-series  characterization  of  the  data  Is  very  promising 
from  the  point  of  view  of  affording  to  the  communications  system  designer 
and  planner  a means  to  predict  the  human  element  of  the  total  communications 
system  architecture.  The  following  stochastic  formulations  obtained  were 
very  adequate  in  characterizing  the  underlying  process  of  error  performance: 

a.  for  the  (1,4)  environment,  teletypewriter  terminal > we  obtained 
the  mixed  autoregressive-moving  averages  ( AWA)  model: 

Xt  ■ -0.046  + 0,660Xt_1  + 0.367Xt_2  + Zt  + 0.4492^  + 0.223Zt,2  + 
0.422Zt_3, 
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b.  for  the  (1,4)  environment,  optical  display  terminal,  the  third 
order  autoregressive  ( AR)  model  obtained  was: 

Xt  * + 0.254Xt-1  + 0.133Xt_2  + 0.355Xt_3  + 0„258Xt_4  + Zt, 

c.  for  the  (4,4)  environment,  teletypewriter  terminal,  we  obtained 
another  mixed  model,  (ARMA): 

Xt  - 0.006  + l^SX^  - 0.570Xt_2  - 0.215Xt_3  + Zt  + 0.950Zt_j  + 
0,191Zt_2  + 0.016Zt-3, 

d.  and,  finally,  for  the  (4,4)  environment,  optical  display  terminal, 
the  third  order  moving-averages  (MA)  process  obtained  was: 

Xt  - 2.158  + Zt  - 0.453Zt_1  + 0.023Zt,2  + 0.051Zt.3. 

To  Illustrate  the  adequacy  of  the  models  figures  1 and  2 graphically 
display  the  observed  and  simulated  Information  for  the  optical  display 
terminal  (ODT).  These  particular  presentations  were  chosen  because  of  the 
projected  role  of  the  ODT  In  future  communications  systems.  The  details 
of  the  teletypewriter  terminal  analysis  and  a comparison  to  the  ODT  will 
be  presented  at  a later  date. 

One  of  the  Implied  features  of  this  research  Is  that  for  each  environ- 
mental combination,  no  common  realization,  either  ARMA,  MA,  or  AR,  was 
obtained  to  characterize  operator  performance.  One  can  conclude,  therefore, 
that  even  with  an  adequately  developed  procedure  for  analysis,  more  than  one 
characterization  may  be  required  to  evaluate  the  human  subsystem  In  sophis- 
ticated communications  systems.  The  procedures  developed  clearly  provide 
a realistic  view  of  the  complex  man-machine  Interface  that  occurs  In 
current  communications  systems. 


BIBLIOGRAPHY 

1.  "Design  of  Experiments  Dealing  with  Man-Machine  Interfaces  In  Current 

Communication  Systems",  R.  J.  D'Accardl,  H.  S.  Bennett,  and  R.  S. 
Hennessy.  Proceedings  of  the  Twenty-first  Conference  on  the  Design 
of  Experiments  In  Army  Research  Development  and  Testing.  ARO  He port 
76-2,  May  1976.'  

2.  "Probability  and  Statistics  for  Engineers  and  Scientists",  Walpo'le, 
R.S.,  and  ft.  h.  Myers,  Macmillan,  New  York,  19771 

3.  "Time  Series  Analysis.  Forecasting  and  Control.  Box,  G.E.P..  and  G.M. 
Jenkins,  Holden-Day,  san  Francisco,  157157 

4.  "Applied  Regression  Analysis",  Draper,  N.  R.  and  H.  Smith,  John  Wiley, 

New  York,  *966. 


258 


FILTERED  MODEL 


Four  Second  Time  Intervals 

FIGURE  2 SIMULATED  MAH/MACHINE  INTERFACE  SERIES  USING  THE  MOVING-AVERAGES  MODEL  vs. 

THE  OBSERVED  INFORMATION  FOR  THE  OPTICAL  DISPLAY  TERMINAL,  4.4  ENVIRONMENT 


IMPROVED  QUANTIFICATION  OF  PLAYER  5 

EFFECTS  IN  EXPERIMENTAL  DESION  1 
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ABSTRACT . This  paper  discusses  and  illustrates  a methodological 
alternative  to  standard  experimental  designs  for  use  in  certain  applications.' 

Focus  is  on  the  quantification  of  random  subject  or  player  effectB  to  be  used 
in  place  of  dummy  (1,  0)  variables  in  the  usual  linear  model  assumed  for 
analysis  of  variance.  The  advantages  of  this  approach  are:  (1)  increasing 
the  efficiency  of  the  analysis;  (2)  providing  explanation  of  player  differences; 

(3)  forming  a base  for  the  evaluation  of  adjusted  treatment  effects;  and  (4) 
logical  formulization  for  extrapolations  to  other  populations  of  individuals 
for  Increased  utility  of  the  results. 

The  reader  is  assumed  to  have  some  familiarity  with  the  statistical 
analysis  of  experimental  data. 

I.  A MIXED  EFFECTS  MODEL. 

Consider  the  Mixed  Effects  Model  j 

i 

yij  " u + Ti  + 0j  + ciJ  C1*!) 

where  t.  is  the  fixed  differential  effect  of  the  i-th  treatment,  i-1,..., 

1 1 
p;  the  Bj,  j«l,.,.,  q,  are  random  block  effects  which  are  assumed  normally  I 

and  independently  distributed  with  E(8.)  “ 0 and  variance  (B.)  * i.a., 

2 J o i 

Bj : NID  (0,  og) ; for  the  model  error,  , it  is  assumed  that  : NID  (0,  ^e) • j 

The  dependant  variable  is  y^ , while  p is  the  base  from  which  differential  effects 

are  measured . • 

i 

! 

i 

i I 

The  normality  assumption  is  required  for  tests  of  significance,  not  for  l 

estimation.  i 

i 

i 

.1 
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II.  THE  INDIVIDUAL  AS  A BLOCK.  In  many  applications,  the  block  is 

defined  by  an  individual  who  is  subjected  to  some  or  all  of  the  p treatments 

2 

in  succession.  For  example,  in  clinical  trials,  cross-over  designs  are  used 
co  compare  drugs  (treatments)  by  subjecting  each  Individual  to  each  drug  in 
succession.  A second  example  is  in  military  field  testing  where  each  partici- 
pant is  subjected  to  fire  by  each  of  different  weapon  types  (treatments),  the 
dependent  variable  is  some  measure  of  suppression  per  treatment. 

III.  TREATMENT  BY  BLOCK  INTERACTIONS.  In  designing  experiments 
characterized  by  these  examples,  it  is  hoped  that  treatments  and  blocks  do 

not  interact.  Assuming  non-significance,  this  interaction  becomes  the  inherent 
model  error,  , If,  on  the  other  hand,  this  interaction  is  expected  to  be 
significant,  then  replication  should  be  incorporated  in  the  experiment  so  that 
this  interaction  can  be  estimated.  Replication,  however,  poses  problems  in  the 
examples  just  discussed.  In  the  drug  experiment,  replicating  the  individual 
may  induce  complex  carry-over  effects  of  drugs.  In  the  military  experiment, 
replication  may  Induce  learning  and/or  boredom  effects  so  that  repeated  use 
of  an  individual  within  a treatment  level  does  not  constitute  a replication 
in  the  statistical  sense  of  the  word.  Consequently,  many  experiments  are 
designed  under  the  assumption  of  no  block  by  treatment  Interaction  when  prior 
logic  is  to  the  contrary.  In  fact,  it  is  not  unreasonable  to  expect  Indivi- 
duals to  react  differently  to  treatments  in  many  situations.  For  example,  in 
military  field  experimentation,  subjecting  different  individuals  to  the  same 
experimental  situation  will  produce  different  responses  depending  upon  an 
individual's  military  experience,  degree  of  enthusiasm,  mental  aptitude,  physical 
sensitivity  (hearing,  eyesight,  etc.),  and  physical  endurance.  Treatments 
which  are  sensitive  to  any  of  these  attributes  will  lead  to  an  interaction  of 
block  (player)  by  treatment  since  these  attributes  will  differ  from  player  to 
player . 

2Cross-over  designs  (see  (1))  are  sometimes  favored  over  parallel  designs  wherein 
individuals  are  maintained  on  the  same  treatment  over  the  entire  period  of  experi- 
mentation; i.e.,  Individuals  are  nested  within  treataents.  With  cross  over  designs, 
differences  between  individuals  are  neutralized  in  comparing  treatments,  given 
certain  assumptions  are  met. 

<1 

The  ordering  of  drugs  will  generally  differ  between  groups  of  individuals  so  as 
j to  allow  for  estimation  of  carry  over  effects. 

i 

I 

i 


262 


IV . QUANTIFYING  THE  BLOCK  EFFECT  IN  TERMS  OF  A.  DUMMY  VARIABLE.  In 
model  (1.1) , the  block  effect  quantifies  the  individual  in  terms  of  a dummy 
(0.1)  variable.  The  Intent  of  these  variables  1b  to  Isolate  the  between 
individuals  source  of  variation  ao  as  to  increase  the  efficiency  of  the  analysis. 
Note,  however,  that  under  model  (1.1).  no  attempt  la  made  to  distinguish  between 
differences  in  physiological  or  psychological  states  within  individuals;  l.e., 
the  state  of  the  individual  may  vary  during  times  when  different  treatments  are 
administered  to  him.  The  result  of  the  (0,1)  dummy  (block)  variable  analysis 

is  the  estimation  of  an  "average"  effect  for  each  Individual.  If  these  states 
vary  substantially  during  experimentation,  the  efficiency  of  the  analysis 
corresponding  to  model  (1.1)  decreases  relative  to  the  case  where  "adjustments" 
thru  the  use  of  covariables  that  quantify  these  states  are  made  for.  Moreover, 
variations  in  these  states  during  experimentation,  which  cannot  be  realistically 
controlled  only  measured,  can  lead  to  serious  biases  in  comparing  treatments 
when  a predominance  of  a particular  state  exists  within  a treatment  over  another 
treatment. 

V.  OTHER  METHODS  OF  QUANTIFYING  THE  INDIVIDUAL.  How  does  one  quantify 
the  individual  other  than  through  dummy  variables?  A general  answer  is  through 
covariables  while  a specific  answer  lies  in  the  particular  application.  In  drug 
experiments,  measures  drawn  from  the  blood  and/or  urine  serve  to  quantify  the 
individual.  In  the  military  field  test,  the  individual  partially  quantifies 
himself  in  terms  of  his  responses  to  psychological  questions. 

If  the  paychological  or  physiological  Btates  are  not  expected  to  vary 
significantly  within  individuals  over  the  course  of  the  experiment,  quantification 
of  the  individual  nay  be  required  only  once,  say  prior  to  the  application  of  the 
first  treatment.  Replacing  the  block  dummy  variables  with  the  covariables  quanti- 
fying the  individual  serves  several  purposes.  Firstly,  the  covariables  explain 
differences  between  individuals  whereas  dummy  variables  do  not.  Secondly, 
assuming  that  only  a few  covariables  are  required  to  adequately  quantify  the 
person,  the  replacement  of  the  dummy  variables  by  the  covariables  adds  to  the 
error  degrees  of  freedom  and  hence  to  the  power  of  the  test.  Thirdly,  more 
freedom  is  allowed  to  estimate  treatment  by  player  cover iable(s)  interaction  in 
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lieu  of  the  previously  mentioned  treatment  by  block  interaction.  Finally, 
the  possibility  exists  to  use  these  covariableB  as  a logical  formulization 
for  extrapolation  to  other  populations  of  players  thereby  enhancing  the 
utility  of  the  results. 

Then  in  place  of  model  (1.1), 


ik 


r 

Z 


a0  + H + kf1  ak  x ik  + 6ik 


(5.1) 


nay  be  applied,  where  x^k  denote  covariableo  which  quantify  the  individual, 
k-1,...,  r;  the  are  regression  coefficients;  and  the  are  model  errors 
with  the  usual  assumptions  for  accompany  the  model  for  tests  of  significance. 

Model  (5.1)  holds  if  the  states  fluctuate  widely  within  individuals, 
though  in  this  case,  quantification  of  the  individual  should  take  place  just 
prior  to  each  treatment  application,  not  after.  If  the  individual  is  quanti- 
fied following  treatment,  there  is  the  possibility  of  treatments  affecting  the 
covariables.  In  this  event,  direct  and  indirect  treatment  effects  may  have  to 
be  considered  through  a system  of  structural  regression  equations;  a.g.,  model 
(5.1)  and 
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(5.2) 


could  form  a system  where  of  (5.1)  is  the  direct  i-th  treatment  affect 


on  y,  Tki  of  (5.2)  is  the  direct  i-th  treatment  effect  on  x^,  and 
is  the  overall  1-th  treatment  effect  on  y;  see  Malllos  (2) 


For  the  case  of  significant  block  by  treatment  interactions  with  player 
quantification  taking  place  prior  to  each  treatment  application,  the  model  would 
take  the  form 

yik  ‘ Yo  + Ti  + \ Xik  + (TY)lk  Xik  + eik  (5’3) 


where  the  Y are  regression  coefficients.  The  ( ty  )ik  allow  for  the  Y ^to 
differ  between  treatments.  Here,  note  that  with  this  formulization,  replication 
within  a treatment  is  not  necessary,  since  the  repititlon  aspect  is  through 
the  communality,  provided  overlapping  exists  between  treatments,  of  the 
responses. 
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VI . QUANTIFYING  A PLAYER’S  PROPENSITY  TO  PARTICIPATE  IN  A SUPPRESSION 
y EXPERIMENT.  An  experiment  was  conducted  to  evaluate  individuals' 

■ assessment  of  danger  when  fired  upon  under  different  conditions  while  situated 

in  foxholes.  The  seven  treatments  included  overhead  fire  by  different  small 

t arms  with  varying  bursts  at  varying  ranges.  Initially,  31  participants  were 

rehearsed  and  prebriefed  on  experimental  objectives  and  techniques.  There- 

i upon,  all  31  were  situated  in  separate  foxholes  and  were  simultaneously 

subjected  to  each  treatment  over  seven  distinct  trials.  Following  each  trial, 

each  player  gave  an  assessment  as  to  whether  the  particular  treatment  was 

"very  dangerous",  "quite  dangerous",  "fairly  dangerous",  or  "not  very  dangerous". 

Prior  to  each  trial,  each  player  answered  the  series  of  questions  in 
Table  6.1.  Their  answers  were  intended  to  give  treasure  to  the  player's  propen- 
sity to  participate  in  the  experiment.  In  very  loose  terms,  the  answers  give 
measure  to  player  motivation. 

Note  that  moat  of  the  questions  are  directed  at  short  term  attitude 
changes  rather  than  long  term  changes;  e.g.,  the  player  could  be  bored,  tired, 
or  hungry  on  one  trial  but  not  another.  Table  6.1  presents  the  percentage  of 
yes  responses  over  all  players  and  all  trials.  Due  to  the  high  percentage  of 
yes  responses,  questions  4 and  8 were  deleted  from  further  consideration. 

Since  the  questions  were  answered  on  a per  trial  basis,  it  must  be 
established  that  the  questions  had  the  same  meaning  between  trials  or  that 
relations  between  questions  remained  the  same  over  trials.  Accordingly,  based 
on  quantifying  "yes",  no  answer,  and  "no"  responses  according  to  -1,  0,  and  1, 
a 10  by  10  covariance  matrix,  say  S^,  baaed  on  questionnaire  responses  was 
calculated  for  each  trial.  Let  Si  estimate  Then  relations  between 

questions  differ  between  trials  if  the  hypothesis 

Ho  ! E1  ’ “ Z7 

is  rejected.  Using  the  likelihood  ratio  criterion  (see  (3)),  Hq  was  not  rejected, 
i Consequently,  all  the  questionnaire  data  were  pooled  into  one  covariance  matrix 
(based  on  differing  mean  vectors  per  trial),  say  S. 
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The  matrix  S was  converted  to  C,  the  matrix  of  simple  correlations, 
and  a principal  component  analysis  (3)  was  performed.  The  three  eigenvectors 
associated  with  the  three  largest  eigenroots  are  given  in  Table  6.1.  These 
eigenvectors  are  part  of  a principal  component  analysis  and  provide  a redimen- 
sioning of  the  original  questions  to  isolate  the  inherent  pattern  in  the 
responses  to  the  questions.  Thus,  the  eigenvector  associated  with  the  largest 
eigenroot  represents  the  linear  combination  of  the  original  responses  which 
had  the  most  variability.  These  eigenvectors  are  then  used  to  generate  the 
values  of  the  covariables.  On  a subjective  basis,  these  eigenvectors  are 
designated  as  indices  relating  to  experiment  validity,  to  player  discomfort, 
and  to  trial  structure. 

For  the  first  index,  scores  for  tha  31  players  are  given  for  a particular 
trial  in  Table  6.2.  These  scores  reflect  the  degree  to  which  participants  felt 
the  experiment  was  valid  prior  to  the  particular  trial. 

VII.  REPLACING  BLOCK  EFFECTS  WITH  COVARIABLES  IN  A.  DISCRIMINANT  ANALYSIS. 

In  this  experiment,  the  dependent  variable  - level  of  danger  - is  cate- 

4 

gorlcal  so  that  discriminant  analysis  is  a natural  recourse  with  treatments  and 
blocks  as  predictors. 

In  the  first  analysis  treatments  were  forced  in  as  predictors  while  player 
1 (block)  effects  were  allowed  to  enter,  if  significant,  through  stepwise  discri- 
minant analysis  (4).  In  the  second  analyele,  block  effectB  wars  replaced  by 

K 

covariablea  produced  from  eigenvectors  corresponding  to  C and  by  Interactions 
of  tha  covariablea  produced  from  the  first  three  eigenvectors  with  treatments. 
Again,  treatments  were  forced  in  as  predictors  while  the  other  variables  ware 
scanned  for  significance  as  before. 

One  result  was  that  following  scanning  of  variables  for  significance,  the 
; U statistic  (a  measure  of  the  goodness  of  the  discriminate)  dropped  from  .59 
( in  the  first  analysis  to  .46  in  the  second.  Thus,  quantifying  the  player  per 
f treatment  allow  for  a great  explanation  of  the  variability. 

ip 

l z 

' Although  the  following  example  employs  a discriminant  analysis,  this  quantlfica- 

i tlon  technique  has  and  can  ba  used  in  the  general  linear  model. 

j 5An  eigenvector  when  multiplied  with  the  vector  Xn  of  (1,0)  responses  will 

produce  the  scores  which  become  the  measure(e)  of  tha  covarlabla(s)  to  ba 
used  as  the  predictors  in  the  model. 
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The  model  exercise  of  Che  discriminant  function  through  Bayes  Theokln^  I 

is  given  in  Table  7.1.  Presented  therein  are  the  probabilities  of  the  four 
danger  categories  given  the  particular  treatment  and  given  a particular  score 
for  the  first  index.  For  example,  for  treatment  7,  a high  score  for  index  1 I 

was  contrasted  with  a low  score.  Of  these  who  thought  the  experiment  was 
valid  (large  negative  scores  for  index  1),  57%  though  treatment  7 was  very 
dangerous,  36%  considered  it  quite  dangerous,  6%  considered  it  faily  dangerous, 
while  1%  said  it  was  not  very  dangerous.  These  probabilities  are  contrasted 
with  those  associated. with  individuals  who  thought  the  experiment  was  not  valid. 

The  obvious  implication  here  is  that  the  players  "propensity  to  parti- 
cipate" going  into' a particular  trial  has  an  overwhelming  effect  on  the  outcome. 

Without  adjustments  for  these  states,  experimental  \*esults  would  have  fallen  ! 

somewhere  between  the  two  sets  of  results  in  Table  7.1.  Thus,  it  can  be  seen  i 

that  the  quant.if ication  of  the  player  in  this  way  not  only  provided  a more  I 

efficient  analysis,  but  also  some  insight  into  the  dynamics  of  the  experiment  ! 

which  would  ultimately  lead  to  better  experiiaar  tal  technique. 
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TABLE  6.1  PLAYER  QUESTIONNAIRE  AND  FIRST  THREE  PRINCIPLE  COMPONENTS 
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FIGURE  6.2  PLAYER  SCORES  FOR  VALIDITY  INDEX 


TABLE  7.1  PROBABILITIES  OF  DANGER  LEVELS  FOR  GIVEN  TREATMENTS 
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ERRORS  IN  LINEAR  FITS  DUE  TO  FUNCTION  MISMATCH 
AND  NOISE  WITH  SPLINE  APPLICATIONS 
6.  W.  Lank,  W.  B.  Kendall,  P.  A.  Gartenberg 
MARK  Resources,  Inc.,  Marina  del  Rey,  California 

INTRODUCTION 

In  producing  trajectory  estimates  from  noisy  radar  dAta  It  is  generally 
necessary  to  smooth  the  radar  data  by  fitting  a deterministic  function  to 
it.  The  choice  of  function  depends  on  how  much  is  known  about  the  trajectory. 
However,  usually  all  that  is  known  is  that  range  as  a function  of  time  will 
be  a "smooth"  function  with  "small"  values  fur  its  higher  derivatives.  Then 
a reasonable  and  practical  choice  for  the  deterministic  function  is  a poly- 
nomial of  low  order.  This  is  the  function  which  has  zero  for  all  derivatives 
beyond  s certain  order,  and  thus  will  be  a good  approximation  to  any  true 
range  function  which  has  sufficiently  small  higher  derivatives  over  the 
smoothing  interval. 

A smoothing  function  related  to  polynomials,  but  which  has  wider  applic- 
ability, is  the  polynomial  spline.  This  function  consists  of  a series  of 
polynomials  which  are  used  over  contiguous  time  Intervals  to  represent  the 
true  range  function.  The  individual  time  Intervals  are  chosen  to  be 
sufficiently  short  for  all  higher-order  derivatives  to  be  negligible  (l.e., 
over  each  short  Interval  the  range  data  vary  nearly  follow  a polynomial) 
and  smoothness  of  the  overall  fit  is  achieved  by  constraining  the  Individual 
polynomials  to  match  their  neighbor's  value,  slope,  and  perhaps  higher  deriva- 
tives, at  the  boundaries  (knots)  between  polynomials.  This  function  has  the 
advantages  that  it  can  be  used  to  smooth  data  over  Intervals  which  are 
far  coo  long  to  usa  a low-order  polynomial,  but  at  the  same  time  it  is  much 
more  constrained  (and,  therefore,  much  smoother)  than  a higher-order 
polynomial . 
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PROBLEM  TO  BE  SOLVED 


— 

t J 

This  bring*  us  to  th«  problem  addressed  here.  Whan  fitting  apllnea  I 

; I 

i to  noiay  measurements  there  are  two  diatinct  sourcaa  of  error  which 

i i 

pravant  tha  fitted  amooth  function  from  baing  equal  to  tha  true  noise-  j 

fra*  underlying  function:  (1)  Even  in  the  absence  of  noise  the  under-  ' 

lying  (trajectory)  function  may  not  ba  of  the  fora  of  a spline,  so  that 
i a perfect  fit  is  iapoaslbl*.  (2)  Noise  in  the  (rang*)  measurements 

of  the  underlying  function  prevent  a perfect  fit.  Quantitative  results 
for  the  affects  of  these  two  error  sources  can  ba  gotten  as  follows. 

I 

FORMULATION 

■ — | 

Assume  w*  observe  a function,  such  as  rang*  versus  time,  at  M discrete  j 

timas  which  are  not  necessarily  unlfornly  spaced.  The  observations  of  the 

j 

function  have  additive  noise  present  in  each  sample.  Tha  noise  is  Gaussian, 

sero  mean,  Independent  from  sample  to  sample,  and  has  ths  same  variance 
2 

o at  each  sample.  The  observed  noisy  function  is  to  be  fitted  in  time 
by  the  weighted  sum  of  7 basis  functions.  In  general  if  tha  function  were 
to  be  observed  noise  free,  its  form  would  not  necessarily  ba  exactly  equal 
to  a weighted  sum  of  the  7 basis  functions.  A set  of  basis  functions  which 
in  used  in  practice  is  those  functions  which  yield  a polynomial  spline. 

THE  ERROR  AVERAGED  OVER  TIME 

The  statistics  of  the  sum  E^  of  all  the  squared  errors  at  the  sampled 
times  (i.e. , the  sum  of  the  squared  differences  between  the  resultant  weighted 
sum  of  the  basis  functions  and  the  actual  noise-free  function)  is  found.  It 

2 

is  found  that  E^  has  a biased  x distribution  with  7 degrees  of  freedom 


r 

f 

i 
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2 

with  the  variance  corresponding  to  each  degree  of  freedom  given  by  o . 


The  bias  is  the  sum  of  the  squared  errors  which  would  exist  at  the  sampled 
times  if  no  nolee  were  present.  It  is  due  to  the  fact  that  the  noise- 
free  function  is  not  necessarily  exactly  equal  to  a weighted  sum  of  the 
F basis  functions. 

The  probability  density  of  is  specifically  given  by 


p(et)  - 


xF/2-l#-x/2 

2p/2«2r(F/2) 


x > 0 


0 < x 


where 


x - (FyE^/cT, 

^ - bias, 

F - number  of  basis  functions, 
r(')  - the  gamma  function. 

The  significant  characteristic  of  E^  as  far  as  the  noise  is  concerned 

is  that  for  a given  bias  the  probability  density  of  depends  only 
2 

upon  a and  F,  and  not  on  the  specific  functional  form  of  the  basis 
functions  used.  It  is  also  independent  of  the  number  of  sampled  points. 
Furthermore,  the  ensemble  average  of  K_  is 


- o2F  + Efa. 


Thus,  the  larger  the  number  F of  degrees  of  freedom,  the  larger  will  be 
the  expected  error. 


If  the  structural  forms  of  the  basis  functions  are  changed  in  order 
to  make  the  bias  smaller,  than  the  probability  density  p(E^)  will  be 
unaffected  except  for  a shift  of  the  function  to  lower  values  of  E^. 

This  shift  equals  the  difference  between  the  original  and  the  new  value 
of  This  is  true  as  long  as  the  number  F remains  constant.  Thus, 
for  constant  F it  may  be  possible  to  reduce  the  errors  in  the  nolee- 
free  function  estimate  by  making  functional  changes  in  the  besls  functions. 
Doing  this  will  not  affect  the  statistics  of  the  error  due  to  the  presence 
of  noisa.  The  effects  of  noise  and  E^  on  the  resultant  error  are  thus 
statistically  independent. 

THE  ERROR  AT  SPECIFIC  TIMES 

The  squared  error  between  the  fit  end  the  actual  function  at  any 

2 

given  time  (not  necessarily  at  a sampled  time)  has  a non-central  x distri- 
bution with  one  degree  of  freedom.  The  noncentrality  parameter  is  the 
squared  error  between  the  weighted  sum  of  the  F basis  functions  and  the 
function  to  be  fitted  when  no  noise  is  present.  The  variance  for  the 
one  degree  of  freedom  is  the  mean  squared  error  due  to  the  effect  of  the 
noise. 

It  has  been  found  that  the  variance  at  a specific  time  cannot 
be  obtained  without  knowledge  of  the  basis  functions,  and  evsn  then 
it  cannot  be  obtained  in  closed  form.  However,  it  can  be  evaluated 
readily  by  numerical  computer  techniques.  This  has  bean  done  for  the 

case  of  polynomial  splines.  The  polynomials'  first  P-1  derivatives (s) 

_ ~l_ 

If  we  have  P-0  then  P-1  is  -1.  In  this  case  neither  the  function  nor 
its  derivatives  are  continuous  at  the  knots  (l.e.,  independent  polynomisls 
ere  fit  between  adjacent  knots). 


are  assumed  continuous  at  N knots.  The  values  of  the  polynomials  at 
the  knots  at  the  beginning  and  end  of  the  spline  are  not  constrained. 

Each  oi  the  polynomials  making  up  the  spline  is  of  degree  D.  The  knots 
are  not  assumed  to  be  uniformly  spaced.  Also,  the  times  at  which  sampling 
takes  place  are  not  uniformly  spaced,  nor  do  they  have  to  occur  at  the 
times  at  which  the  knots  are  placed.  The  number  of  degrees  of  freedom 
in  this  case  is  given  by 


F,  - P + <N-1) (D+l-P ) . 

Examples  of  mean  squared  error  versus  time  have  been  obtained  for  this  case 
uelng  e computer  program.  Examples  are  shown  in  Figures  1 through  8.  The 
examples  are  all  for  third-degree  polynomial  splints  (D«3).  Cases  have 
been  obtained  using  three  knots  and  also  six  knots.  Values  of  P used  were 
from  sero  to  three,  which  covers  the  range  of  continuities  which  can  exist 
at  the  knots  of  a third-degree  polynomial  spline. 

In  all  casea  the  M discrete  times  at  which  the  function  la  sampled 

are  uniformly  spaced.  The  value  of  M used  was  large,  as  this  la  the  sltua- 

tion  of  general  interest,  The  total  time  of  observation  used  for  all  plots 

2 

was  one  unit  of  time.  Plots  of  mean-squarad  arror  multiplied  by  (M/o  ) 
versus  time  were  made.  For  any  total  time  of  observation  and  any  large  M 
these  plota  can  be  used  to  obtain  the  mean-squared  error  versus  time.  This 
is  done  by  multiplying  the  ordinate  by  the  actual  a /M  and  the  abscissa 
by  the  actual  total  time  of  observation. 

CONCLUSION 

The  errors  in  spline  fits  to  noisy  data  have  bean  analyzed,  and 
their  probability  distribution  has  been  determined.  Closed-form  results 
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were  obtained  for  the  statistics  of  the  squared  error  averaged  over 
time.  Numerical  results  for  the  statistics  of  the  squared  error  sb  a 


Mean- Squared  Error  for  Three  Knots  (Third-Degree  ?ol; 
Continuous  to  First  Derivative) . 


K-500 


Figure  4,  Mean-squared  Error  for  Three  Knots  (Third-Degree  Polynomials 
Continuous  to  Second  Derivative). 


Mean-squared  Error  for  Three  Knots  (Third-Degree  Polyni 
Continuous  to  Third  Derivative). 


Figure  6.  Mean-squared 
Polynomials) . 


Jan-squared  Error  for  Six  Knots  (Third-Degree  Pol 
mtinuous  to  First  Derivative). 


3-  <r 


j,c/H  (aoaaa  psaenbs-UMH) 


Figure  8.  Mean-squared  Error  for  Sir  Knots  (Third-Degree  Polynomial 
Continuous  to  Second  Derivative) . 
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ABSTRACT  | 

One  of  the  primary  functions  of  a gun  air  defense  system  • 

is  to  accurately  predict  the  target  position  a time-of-flight  j 

into  the  future.  If  the  future  position  is  known  accurately,  j 

then  one  can.  be  reasonably  assured  of  a hit  on  target  pro-  I 

vided  that  the  remaining  fire  control  errors  as  well  as 
errors  arising  from  uncertainties  in  ballistic  and  meterio- 
logical  conditions  are  small. 

I 

The  availability  of  aircraft  flight  data  made  it  possible  j 

for  the  first  time  to  analyze  aircraft  motion  statistically 
and  to  build  models  of  aircraft  motion.  These  models  are,  Of 
necessity,  Statistical  because  those  components  of  aircraft 
motion  induced  by  wind  gusts,  terrain  features,  and  evasive 
maneuvers  are  generally  unknown  and  must  therefore  be  treated 
as  random  variables. 

It  was  found  that  models  of  rate  of  change  of  target 
acceleration  as  autoregressive  moving  average  processes  lead 
to  prediction  schemes  which  enhanced  the  predictability  of 
target  future  position,'  especially  at  extended  ranges  (long  i 

time-of- flight) . Furthermore,  these  models  were  found  to  i 

exhibit  a remarkable  degree  of  robustness;  a lack  of  sensi- 
tivity due  to  changes  in  the  coefficients  of  the  autoregres- 
sive models  as  well  as  to  changes  in  aircraft  maneuvers 
seems  to  be  an  inherent  feature  of  these  models. 

Other  variables,  Chosen  to  be  more  explicitly  tied  to  the 
dynamics  of  aircraft  motion  and  less  dependent  on  the  choice 
of  coordinates,  Were  also  modeled  as  autoregressive  processes. 

Again,  the  results  were  encouraging,  indicating  that  signifi- 
cant improvements  in  predictive  capability  inhere  in  the  - ! 

autoregressive  models.  j 

t 

i 


♦Much  of  the  work  done  on  this  project  was  contributed  by 
Max  Mintz,  Steve  Heullng,  Stan  Goodman. 
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I . INTRODUCTION 

The  most  common  prediction  schemes  in  use  for  many  years 
in  the  air  defense  community  were  the  so-called  linear  and 
quadratic  prediction  equations 

X(t  4-  tp)  = x(t)  + X(t)tp 

X(t  + tp)  = x(t)  + X(t)tp  + X xt t ) t p 

referenced  to  some  inertial  coordinate  system.*  The  realiza- 
tion that  these  equations  fare  poorly  against  highly  maneuver- 
ing targets  led  to  the  development  of  numerous  predictors, 
including  polynomial  types  after  Blackman,'1)  constant  energy 
and  defense  of  a known  point  after  H.  Weis^(2)  as  well  as  the 
variety  derivable  from  the  Weiner  as  well  as  the  more  general 
Kalman-Bucey  filter  equations.  Unfortunately,  the  perform- 
ance of  these  predictors  against  real  targets  remained  largely 
unknown.  With  the  availability  of  attack  aircraft  data  in 
1974,  however,  their  relative  predictive  capabilities  could 
be  determined. ( 3)  The  results  led  to  the  conclusion  that  no 
one  predictor  is  best  for  all  classes  of  attack  maneuvers  for 
a particular  aircraft.  Furthermore,  it  was  found  that  the 
single  largest  contributor  to  the  prediction  error  lay  not  in 
the  availability  of  accurate  knowledge  of  target  state,  but 
rather  in  the  unpredictable  pilot  induced  maneuvers. 

Rather  than  try  to  formulate  a new  set  of  deterministic 
equations  as  in  (1)  and  (2),  one  is  thus  led  to  consider 
statistical  models  of  target  motion.  Although  there  is  no 
a-priori  reason  for  believing  that  autoregressive  models  will 
lead  to  better  predictors,  their  consideration  appears 
reasonable  in  view  of  the  exhaustive  efforts  already  directed 
to  alternative  schemes. 


*For  the  short  times  of  flight  involved  (1-4  sec),  the 
rotation  of  the  earth  can  be  neglected.  Thus,  a 
coordinate  frame  fixed  to  a stationary  weapon  system 
can  be  viewed  as  an  inertial  reference  frame. 
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II.  AR  MODELS 


Let  x ^ be  some  generalized  coordinate.'  The  variable 
x takes  on  the  value  X}  at  time  (i-l)i  .where  & is 
some  tine  increment.  Autoregressive  models,  then,  are 
governed  by  the  fojlrwing  assumptions: 

+ . . . + S.x  „ + u 
y r.-p  n 


(1) 

X = fa  - X 

•.+  i,x 

n - r. 

-I  «•  r. 

; “ i e , 

(2) 

ElunJ  - 

o,  y n 

E[“nuJ 

* a 2 £ 

mn 

W.n  ' 

where  E [ *3  indicates  an  ensemble  average  over  •. 

ir  yo . then  the  model  is  an  autoregressive  (AR)  model 
of  order  p. 

Since  fu  ) satisfies  (2)  for  all  n,  and  x is  given  by  (1), 
u is  uncorr^l ated  with  ,,  X , , X n,  • • .nand  E[x  ] = 0 
f8r  all  m n_1  n-1 

If 

(3) 


n-2 1 


m 


r(k)  s E[xntk«nJ 


then  one  can  determine  the 
functions  r(k)  as  follows: 


3i's  in  terms  of  the  covariance 


Multiply  (1),  successively  by  xn-1 , xn_g, 
obtain  p equations  of  the  form.  ” “ c 


• • ■ x to 

n-p 


(4)  xx 

n n- j 


B.x  , x 
1 n-1  n-j 


n-j  - "aWn-j*— 


, + u_  X 


Taking  the  expectation  value  of  both  sides  of  (4),  one 
obtains  p equations  in  p unknowns.  The  r(k)'s  are  assumed 
known.  Defining  rp  and  R by 

’r(O)  r(l ) ...  r(p-l ) 

r(l ) r(0)  ...  r(p-2) 

• • 

• • 

r(p-l ) ...  r(0) 

the  p equations  can  be  expressed  mere  concisely  by 
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k ' "n 


(5) 

R S = r 

P 

or 

(6) 

3 = 

R-1  r 

P 

where  S 

is 

the  column  matrix  (0!  •,  g2 

III.  ESTIMATION  OF  3 FROM  ACTUAL  TIME  SERIES  DATA 

The  determination  of  0 is  dependent  upon  one's  ability 
to  compute  r(k)  from  an  ensemble  of  time  series  data.  In 
practice,  however,  Such  data  is  often  unavailable  for  obvious 
reasons;  time,  money,  and  resources  often  do  not  permit  the 
accumulation  of  such  data.  This  is  especially  true  in  the 
present  discussion  where  replication  of  aircraft  flight  paths 
becomes  both  time  consuming  and  costly.  One  is  thus  lead  to 
consider  replacement  of  ensemble  averages  with  averages  over 
time.  Thus,  in  olace  of  (3),  one  estimates  r(k)  with 

. N-k 

f " N J^i+k*!  ' 

The  matrix  0 is  then  formed  by  repl  acing  r(k)  with  r(k) 
in  the  relation 


0 - R-1r  . 
P 


IV.  MODEL  IDENTIFICATION 

The  determination  of  the  "proper"  choice  of  p is  an 
important  practical  question.  The  following  result.^4'  is 
often  useful : 

If  for  a given  choice  of  p the  estimated  value  of  3- 
derived  from  the  sample  covariance  function  satisfies  p 
|3p|  « l/rt?  , then  one  can  assume  that  e^-Q  and  hence 
check  the  model  with  order  p-1 . However,  in  order  for  this 
result  to  hold,  one  must  strengthen  the  assumptions  with 
respect  to  [u  ].  Specifically,  one  assumes  that  the  [u  ] 
are  independent  and  identically  distributed  random  n 
variables. 

Fortunately,  the  autoregressive  models  considered  here 
turned  out  to  be  of  order  no  higher  than  six. 
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V,  OTHER  MODELS 


Models  other  than  autoregressive  models  were  also  con- 
sidered. These  included  the  moving  average  and  autoregres- 
sive moving  average  models.  These  are  characterized, 
respectively  by 


(7)  x * u + ai  u , + ...+  a u 

n n n— J.  4 n-q 

with 

E [un]  * 0 
E [unum]  - 


and  by 

(e>  *„  - *1  Vl  + 

with 


0ex  + 
F n-p 


un+ 


alun-l+>  * ,+ 


a n 
a^n-q 


E [un]  - 0 
'E  [unun.]  " ‘ 

Equation  (7)  is  a qth  order  moving  average  process  and 
(8)  is  a (p, q)  autoregressive  moving  average  process. 

Analysis  of  the  aircraft  data  indicates  that  alroraft 
motion  is  adequately  modeled  as  an  autoregressive  process 
rather  than  either  a moving  average  or  autoregressive  moving 
average  processes. 


289 


I 


VI.  PREDICTION 

For  a process  described  by  the  autoregressive  model 

n + n-l  « n-2  p n-p  n 

one  wishes  to  estimate  x_  .from  the  known'  values  x , , 
x -,  ...  , x_  _.  This  Ss  accomplished  by  first  noting  the 
f Showing:  n”p 

(9)  E[xn/xn_lt  xn-2,  ...  , xn-p]  - *lxn-l+  VW 


+ ‘ + Vn-p 


(10)  E txn+k/xn-1,  *n„2»  xn-p] 


ni(k>  xn-i  + ,)2<k>xn-2  + nP(k)xn-i 


= x 


where  k > 0. 


The  prodiction  procedure  to  be  derived  will  be  recursive. 
The  resulting  equations  can  be  eaBliy  implemented  and  con- 
cisely expressed  in  matrix  form.  For  this  purpose,  We 
relabel  the  generalized  coordinates  xn  as  follows: 

yl(n)  * xn 
y2(r})  " xn-l 

y'j(n)  - xn-p+l 


Thus, 

yi<n) 

( 1 1 ) y 2 1 n ^ 

yp(n) 


i1 


1 ...  0 


0 ...  0 


y2(n-i: 

yP(n‘L 


Define 

yi(n) 

(12) 

yj(n) 

£(n)  = 

» 

• 

• 

yp(n) 

I 


"St  02  . * • $D 

r o ...  S 

"l 

0 1 ...  0 

0 

e • 

• » 

; r » 

• 

• 

• 

0 0 • • *0 

0 
u _ 

Then 


(13) 

Now 


and 


£(n)  * «^(n-l)  + Tun 

E[Jn)/xn_lt  “ 

E[^(n-l)/xn,1(  ...  , *n_p] 


fx, 


n 

*n-l 


LXn-P+lJ 


80  If 


£(n-l)  - Et^fn-D/x^,  . ..  , *n_p] 
and  A 

*(n)  - E[£(n)/xn_1(  ...  , xn_p3 


then 

(14)  £(n)  ■ 4$(n-l) 

In  general,  if 

J(n+k)  » E[^(n+k)/xn->1 , ...  , Xn_p] 

then 

(15)  $(n+k)  » « k+1  O(n-l) 


which  is  the  scheme  by  which  prediction  is  accomplished  for 
a pth  order  autoregressive  process. 


I 

i 
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VII.  PREDICTION  USING  AIRCRAFT  FLIGHT  DATA 


As  seen  in  Section  II,  un  is  characterized  as  a white 


noise  sequence.  That  is, 
0 


E[un) 


E£U«U,J 

n m 


Analysis  of  the  aircraft  data  Indicated  that  if  W is  the 
rate  of  change  of  target  acceleration  in  some  inertial  coordi- 
nate frame,  then  it  satisfies  approximately  the  statistical 
assumptions  of  AR  models.  One  can  then  predict  future  posi- 
tion with  the' aid  of  the  following  assumptions: 


(16) 

xn 

“ xn-l 

4 A*n-1 

(17) 

• 

xn 

" *n-l 

+A  *^n-l 

(18) 

*n 

r-t 
■ i 

>C 

■ 

♦^n-1 

with ‘it*  modeled  as  an  autoregressive  prooess: 
(19) 


*'n  - + ^*n-2 


. . + 3 + u 

P n-p  n 


Proceeding  as  in  Section  VI, 

rte  0...  0 
A2/2  0...  0 
A 0 • • ••  0 


1 A 

n 

x„ 

0 1 

n 

x„ 

2 _°_ 

n 

x„ 

m 

0 0 

n 

x -i 

0 0 

n-1 

• 

• 1 • 

X* 

n-p+1 

. 

0 6 

or 

~A 

b“ 

— * 

- 

x„ 

0 

A 

n 

L - 

5-1 


A 

_1_ 

0 


0 

t 

0 


0 

0 


8, 


1 

4 

0 


02  * * Bp~l  Bpj 

0. . . 0 0 


n-1 


n-1 


n-1 


'n-1 


n-p 


'n-1 


Thus, 


(20) 


n+k 


“ A 

B 

~ 

” 

0 

«■ 

k+l 


n-1 


which  is  the  analogue  of  equation  (15). 
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The  reason  for  partitioning  the  transition  matrix  as 
above  is  that  this  results  in  substantial  savings  in 
computational  labor. 

From  the  raw  aircraft  data  that  was  available,  two 
independent  sets  of*x  data  were  produced.  One  set  was 
derived  from  smoothed  accelerometer  data  from  on  board 
the  aircraft,  and  the  other  by  thrice  differencing 
smoothed  position  data.  Prediction  for  the  seoond  set, 
i.e.',  the  thrice  differenced  data,  is  accomplished  by 
computing  (xn+3-3xn+2+3xn+1-xn)  th«n  proceeding  as  in  (20). 

The  autoregressive  coefficients  for  the  two  classes  of 
predictors  will  be  different.  This  is  exhibited  in  Figures 
1 and  2 which  show  the  distribution  of  the  roots  of  the 
characteristic  equations  for  the  x-axis.  Here,  each  symbol 
is  associated  with  a separate  and  distinct  flight  path. 

It  is  interesting  to  note  that  the  groupings  of  the  roots 
are  quite  different  for  the  two  models.  (This  is  also 
true  for  the  y and  z axes  which  are  not  shown. ) ■ However, 
the  roots  show  a marked  similarity  within  a particular  class. 
This  suggests  a commonality  in  the  statistical  description 
of  the  data,  although  the  full  import  of  this  feature  was' 
not  investigated. 

Comparing  the  performance  of  the  two  desses  of  models, 
it  was  found  that  the  predictors  developed  from  the  thrice 
differenced  data  do  not  perform  as  well  as  the *x‘ predictors, 
but  the  differences  are  not  substantive  for  short  T,, . 
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VIII.  VELOOITY-QAMMA-BIQMA  AR  MODEL 


Here  we  define  a new  sat  of  dynamic  variables  (v,y,a)  where  v is  the  aircraft 
velocity,  y the  angle  between  v and  the  horizontal  plane,  and  a the  angle 
between  the  projection  of  v onto  the  horizontal  plane  and  y-axis.*  The 
mean  values  of  the  generalized  coordinates  are  not  zero  as  in  the  previous 


oases,  so  they  are  removed  adaptively  from  a one  second  time  window.  Thus, 
the  AR  models  are 


(21) 


\ * sl<Vl  - Vl>  * * e5tv»-5  ’ V5> 


where  v is  the  adaptive  mean 
n 


(22) 


N 


S E 


n~i 


N ■ 10 


N i»l 

the  data  rate  being  10  data  points/sac.  The  variable  v,  it  turns  out,  is 
adequately  modeled  as  a fifth  order  AR  proaesB. 

Similarly, 


(23)  ;n  - vn  * «i(vi- Vi>*  •••  *#6(V6*  V6> 


Sigma,  however,  is  modeled  as  a first-differenced  AR  process  thereby 
reducing  the  dependence  on  the  orientation  of  the  x-y  coordinate  axes. 


(84) 


n-1 


+ Ac  + (Ao 


n-1 


A8n_i)  ♦ ♦ 35(AV5  ‘ A®n-5) 


where 

(25)  Acr 
and 

(26)  Ac. 


n-1 

N 


n-2 


» iJi  AVi 


H ■ 10 


The  relation  between  the  target  position  and  the  dynamic  variables 
(V,y,c)  is  non-linear  so  prediotion  must  prooeed  recursively  via  the 
following  equations: 


(27) 

(28) 


(29) 


n-1 

+ 

A(v  .cos 
n-l 

Vi,ln 

°n-l) 

n-1 

+ 

A(v  .cos 
n-i. 

Vlc0* 

°n-l) 

’n-1 

+ 

A( v .sin 
n-1 

An  inertial  x,y,z  set  is  used  as  before. 
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IX.  RESULTS 
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The  predictors  developed  in  Sections  VII  and  VIII  were 
compared  with  some  of  the  common  predictors  that  have  been 
in  use  over  the  years.  The  comparisons  were  made  by  gen- 
erating a histogram  of  the  number  of  shells  fired  from  a 
hypothetical  gun  air  defense  system  which  fall  within  speci- 
fied bins  or  regions  of  the  target  after  some  time-of-f light 
Tf.  No  attempt  was  made  to  model  error  sources  in  the  fire 
control  system  or  to  generate  realistic  ballistic  trajectories. 

The  shells  were  assumed  to  be  free  of  the  earth's  gravita- 
tional field  and  meteriological  effects.  The  projectile 
velocity  was  taken  at  1000  m/sec.  For  the  purpose  of  com- 
paring predictors,  the  added  complexity  of  introducing  fire  I 

control  errors  and  accurately  modeling  ballistic  trajectories  j 

seems  unwarranted  and  does  not  shed  light  on  the  relative 
efficacy  of  the  predictors  under  comparison. 

Comparison  of  the  A-R  predictors  was  made  with  the 
following  standard  models:  1 j 

Linear 

x(t  + Tf)  * x(t)  + Jc(t)Tf  j 

Quadratic  J 

£(t  + Tf)  • x ( t ) + ^(t.)Tf  + ’ji(t)T|  /2  | 

! 

I 

First  Order  Markovian  in  Acceleration* 

A — h/p  i 

x(t  + Tf)  * x(t)  + *(t)Tf  + Y( t ) ’ e xf  + wTf  - 1 1 

j 

with  w » . 1 i 

( 3 ) 1 

Previous  studies  comparing  a larger  class  of  predictors  | 

of  which  the  three  above  are  a subset  were  made  with  the  con- 
clusion that  no  single  predictor  is  best  over  the  range  of 
flight  paths  considered  here.  Thus,  inclusion  of  this  larger  ; 

class  is  unnecessary  since  nothing  new  will  be  learned  that  j 

is  not  already  known.  ) 


* The  model  for  this  process  is  'X  ■ - uk‘  + u from  which 
the  above  equation  is  derived. 
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Tables  1 and  2 typify  the  type  of  data  that  was  obtained 
in  comparing  A-R  predictors  with  the  three  above.  The  fol- 
lowing explanations  of  these  tables  are  in  order.  Minimum 
miss  distance  is  the  distance  of  closest  approach  between 
the  target  and  projectile.  -Regular  one  point  misses  refer 
to  the  miss  after  time  Tf.  -The  bottom  row  of  numbers  desig- 
nates a distance  between1 target  and  projectile.  -Column  5 
differs  from  all  other  columns  in  that  it  lists  the  total 
number  of  projectiles  falling  within  5 m of  the  target.  -The 
remaining  columns  designate  the  number  of  projectiles  falling 
withjn  a bin  of  certain  width.  -For  example,  Column  3 gives 
the  number  of  projectiles  falling  with  2 to  3 meters  of  the 
target,  Column  7 the  number  of  projectiles  falling  within 
10  to  15  meters  of  the  target,  etc.  -The  last  column  gives 
the  total  number  of  rounds  fired  in  a given  time  interval 
(Tf).  - 

As  is  evident  from  the  figures,  the  A-R  predictor  performs 
better  than  the  quadratic  predictor.  -Of  particular  interest, 
however,  Is  the  region  where  Tf  > 3 sec.,  -where  predictors 
have  traditionally  fared  poorly.  Here,  We  see  that  with  the 
A-R  predictors,  dome  rounds  fall  in  close  proximity  of  the 
target  (ie,  Within  15m);  in  contrast,  rio  rounds  fall  in  the 
region  with  the  quadratic  predictor. 

These  observations  hold  in  general.  That  is,  they  can 
be  made  for  the  entire  class  of  flight  paths  investigated 
(12  in  number),  as  well  as  for  the  linear  and  Markovian 
predictors.  -Furthermore,  the  A-R  thrice-dif ferenoed  predic- 
tors, ds  well  as  the  v-y-c  models,  also  perform  markedly 
better  than  either  of  the  standard  predictors.  • 

Table  3 gives  the  performance  of  the  A-R  thrice-differenced 
predictor  for  flight  pa^s  13  (same  as  for  Table  2)  and  Table  4 
the  performance  of  the  v-y-c  predictors,  also  for  the  -same 
flight  pasB.  -Observe  that  the  thrlce-di/ferenced  predictor 
does  not  perform  quite  as  well  as  the  ‘predictor , dn  observa- 
tion alluded  to  in  Section  VII.  In  addition,  the  v-y-o 
predictors  do  not  fare  as  well  as  the *&’  predictors. 
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X.  ROBUSTNESS 

The  sensitivity  of  the  predictive  performance  of  a 
1 particular  A-R  model  to  changes  in  the  autoregi essive 

r coefficients  is  Important  because,  in  practice,  One  does 

not  have  a-priori  knowledge  of  the  statistics  of  target 
motion  from  which  one  can  compute  these  coefficients.  One 
la  thus  lead  to  pose  the  following  questions:  How  do  the 
predictors  perform  when  the  coefficients  associated  with  a 
particular  axis  are  used  for  all  three  coordinate  axes,  and 
how  well  can  one  predict  with  a single  model  for  all  avail- 
able flight  passes.  -In  answering  these  questions,  it  was 
found  that  the  A-R  models  exhibit  a remarkable  degree  of 
robustness.  Table  5 gives  the  performance  of  a standard 
thrice-differenced  A-R  predictor.  Table  6,  generated  for 
the  same  flight  path,  was  produced  by  using  the  x-coeff icients 
> for  all  three  coordinates.  /Notice  that  little  degration  in 

performance  was  incurred  by  using  a single  set  of  coefficients. 
: Table  7 was  generated  by  using  a model  developed  for  a dif- 

ferent flight  pass.  Again,  the  predictors  perform  quite  well. 
Using  a single  model  for  all  flight  passes,  One  is  led  to  the 
conclusion  that  a single  set  of  A-R  coefficients  can  be  used 
for  prediction  against  a given  aerial  target. 
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XI . DISPERSION  SYNTHESIS 

In  a realistic  combat  environment,  more  than  one  gun 
air  defense  system  will  be  employed  for  defense  of  a given 
area.  If  a communications  link  is  established  between  the 
systems,  one  can  enhance  hit  probability  by  firing  the 
guns  at, points  in  space  dictated  by  some  optimization 
criteria.  Optimization  for  the  location  of  the  bursts 
was  done  for  the  case  where  four  guns  are  employed.  This 
was  done  as  follows: 

Orient  the  y-axis  along  the  target  flight  path.  The 
burst  pattern  is  then  defined  in  the  x-z  plane  as  in  Fig. 

3 below. 


Fig.  3 


The  pattern  is  defined  by: 

2 

Ax  ■=  a^  + a2Tf  + a3Tf 
Az  ■ bAx  . 

The  four  parameters  were  obtained  via  optimization  using 
the  performance  criterion 

-p  - i ! .-d^! 

" i-i 

where  d<  is  the  minimum  miss  of  the  four  shells  and  a - 5.0, 
the  radius  of  the  "hit"  circle.*  This  particular  form  for 
the  performance  criterion  was  chosen  in  order  that  the 
number  of  rounds  falling  within  5m  of  the  center  of  the 
target  be  maximized. 


* For  attack  aircraft,  5m  is  roughly  the  effective  radius 
of  the  target. 
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Table  8 typifies  the  results  obtained  for  all  flight 
passes.  One  finds  that  more  rounds  fall  within  the  5m 
distance  of  the  target,  although  the  percentage  of  rounds 
falling  within  this  distance  is  not  necessarily  larger. 
HoweVer,  there  is  a decrease  in  the  RMS  of  the  distanoe 
of  closest  approach  as  expected.  (Compare  with  Table  2.) 
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XII.  CONCLUSIONS  ! 

* j 

The  results  presented  heretofore  are  by  no  mesna 
conclusive.  'Prediction  using  A-R  models  for  dynamic  j 

variables  linked  more  closely  to  the  dynamics  of  aircraft  ) 

motion  is  presently  under  investigation.  The  brief  dia-  j 

cussion  on  dispersion  synthesis  is  by  no  means  the  last 
word  on  the  subject  and  a game  theoretic  approach  to  the  I 

problem  seems  to  be  in  order.  Degradation  in  predictive 
capability  under  conditions  of  sensor  and  ballistic  errors 
remains  to  be  determined.  -Nevertheless,  the  results 
appear  encouraging.  The  predictors  discussed  here,  Which  I 

were  designed  and  tested  against  actual  aircraft  data,  i 

outperform  any  class  of  predictors  developed  heretofore.  J 

As  more  data  beoomes  available,  additional  tests  of  model  i 

robustness  can  be  made,  Using  an  already  developed  pre- 
dictor against  a new  set  of  flight  data.  i 

♦ i 

One  is  compelled  to  conclude  that  with  some  engineering 
.intuition  and  Judgment,  increased  system  performance  can  be  j 

had  for  a cheap  price  by  properly  analyzing  and  modeling 
threat  data.  . • 
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A SENSITIVITY  EVALUATION  OF  A LARGE  SCALE  TACTICAL  SYSTEM 
AVAILABILITY  UNDER  VARYING  SUPPORT  RESOURCE  LEVELS 
ROBERT  A.  HALL  AND  HOWARD  M.  BRATT 
AVIATION  RESEARCH  AND  DEVELOPMENT  COMMAND 
Fort  Eustis,  Virginia 

Introductl on 

A major  problem  for  Amy  decision  makers  and  consequently  Army 
operations  research  analysts  is  the  estimation  of  needed  resources  to 
support  large  scale  tactical  systems.  This  problem  Is  further  compounded 
by  the  question,  what  If  I reduce  the  particular  resource  by  XXX? 

This  paper  presents  one  method  of  handling  this  problem,  that  Is, 
through  the  use  of  simulation.  The  US  Army,  through  the  Applied  Technology 
Laboratory,  has  developed  computer  mathematical  programs  that  simulate  the 
experiences  of  a system  In  the  field.  These  computer  programs  are  known  as 
the  Army  Reliability  and  Maintainability  Simulation  (ARMS)  model.*  ARMS  Is 
a highly  complex  set  of  computer  programs  that  simulates  the  operational 
and  maintenance  policies  of  a quantity  of  aircraft  In  the  field.  ARMS  flies 
the  aircraft;  breaks  parts;  fixes  the  parts,  either  on-aircraft  or  off- 
aircraft,  If  off-aircraft,  at  one  of  four  different  maintenance  levels; 
inspects  the  aircraft  and  queues  and  limits  the  aircraft  resources.  Use  of 
ARMS  allows  the  analyst  to  define  his  system  to  the  detail  he  requires  or 
to  the  level  to  which  he  has  data.  This  definition  Includes  malfunction 
rates,  probability  of  remove  and  replace,  times  to  repair,  number  of  men 
needed  to  perform  the  repair,  tlme-between-overhaul , If  applicable,  and 
off-equipment  repair  (higher  level  maintenance).  Also  defined  are  mission 
scenarios  by  minute  segments,  scheduled  calls  for  aircraft,  continuous 
missions,  random  missions,  effects  of  flight  essential  failures,  maintenance 
concepts,  manpower  limits,  and  shift  hours. 

The  fielded  system  chosen  for  study  Is  the  CH-47C.  This  Is  a highly 
complex  aircraft  system  that  will  provide  a highly  active  system  for  study. 

When  estimating  aircraft  resources,  there  are  three  broad  areas  that 
may  be  examined:  GSE,  manpower,  and  parts  availability.  This  paper 
examines  all  three  areas  showing  the  Independent  effects  of  a reduction 
In  each  parameter. 

A question  arises,  how  do  you  measure  the  effects  of  a percentage 
reduction  In  a resource?  There  are  as  many  answers  to  this  question  as 
there  are  Interested  groups  wanting  such  an  answer.  We  have  chosen  one 
main  variable  for  examination  based  on  the  assumption  that  the  object  of 
maintenance  Is  to  get  aircraft  ready  for  launch.  If  aircraft  are  ready 
when  called,  then  the  resources  supporting  that  aircraft  are  sufficient. 
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The  CH-47C  Model 


An  ARMS  model  version  of  the  tandem  rotor,  medium  lift  CH-47C 
helicopter  had  been  developed  and  validated  several  years  ago  under 
contract  with  The  Boeing  Company.*  This  model  with  only  minor  changes 
became  the  basic  vehicle  used  In  the  current  study.  The  CH-47C  con- 
sisted of  164  elements  and  11  subsystems.  For  each  element  the  following 
data  was  provided: 

Maintenance  Actions  per  Operating  Hour 

Flight  Criticality 

Mission  Equipment  Essentiality 

Probability  that  a Maintenance  Action  would  be  Discovered  at 
Time  of  Failure 

Probabilities  that  an  Undiscovered  Maintenance  Action  would 
be  Discovered  at  Subsequent  Scheduled  Inspections  and 
Mission  Events 

For  Flight  Critical  Elements,  the  Consequences  of  a Failure 
During  Flight,  Probability  Distribution  Including  Forced 
Landing,  Attrition,  Abort  Mission  and  Continue  Mission 
The  Probability  that  a Maintenance  Action  would  Cause  a Remove 
and  Replace  Event  Rather  than  a Repair  In  Place 
The  Mean  Time,  Using  the  Exponential  Distribution,  to  Repair 
In  Place 

Administrative  Time  Delay  (RIP) 

Ground  Support  Equipment  Required  (RIP) 

Military  Occupation  Speciality  (MOS)'Code  of  Each  Type  of  Mechanic 
Required  and  Number  of  Each  Required  (RIP) 

The  Probability  that  this  Maintenance  Action  would  Require  a 
Functional  Test  Flight  (RIP) 

For  Remove  and  Replace  (R/R)  Maintenance  Actions,  the  Supply  Delay 
Time  to  Obtain  and  Prepare  the  Replacement  Part 
Administrative  Delay  Time  for  such  Things  as  Processing  Paper 
Work,  Scheduling  the  Maintenance  Action,  etc. 

Ground  Support  Equipment  and  Maintenance  Facilities  Required  for 
the  R/R  Action 

The  MOS  Codes  Required  to  Perform  the  R/R  Action 
The  Probability  a Spare  Component  would  be  In  Stock  when  Requested 
(this  parameter  was  used  In  the  study) 

The  Restock  Time,  Delay  to  Obtain  a Part  on  Order  (3  days  was  used 
In  this  study) 

Probability  that  this  R/R  Action  would  Result  in  the  Requirement 
for  a Maintenance  Check  Flight 

There  were  16  Elements  with  Scheduled  Time  Between  Removal  (TBO) 
which  varied  from  2400  hours  to  300  hours 
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There  were  8 scheduled  maintenance  events  modeled  In  the  study: 


Dally  Inspection,  every  24  Hours  If  the  Aircraft  had  Flown 
and  In  72  Hours  if  the  Aircraft  had  not  Flown  on  the 
Previous  3 Days  (not  accomplished  If  the  aircraft  was 
down  for  maintenance  or  lack  of  spare  parts)  ; 

12.5  Hour  Spectrographlc  Oil  Analysis  Sample  i- 

25  Hour  Preventative  Maintenance  Intermediate  Inspection 

25  Hour  Spectrographlc  Oil  Analysis  Sample 

100  Hour  Preventative  Maintenance  Periodic  (PMP)  Inspection 

90  Day  Fire  Extinguishing  System  Inspection 

6 Month  Pitot/Static  System  Inspection 

12  Month  Engine  Fire  Extinguisher  Inspection 

There  were  2 maintenance  shifts  at  organizational  level: 

Shift  I Start  at  0600  Stop  at  1400 

Shift  II  Start  at  1400  Stop  at  2200 

Manpower  quantity  was  a parameter  In  the  study.  The  "Super  Crew 
Chief"  concept,  1.?.,  a mechanic  trained  in  all  maintenance  disciplines. 

This  concept  was  necessary  to  parameterize  the  manpower  function  In  the 
study.  It  could  be  thought  of  as  the  same  as  supporting  a very  large 
number  of  vehicles  which,  because  of  the  size  of  the  fleet,  require  a 
large  number  of  each  type  of  mechanic.  The  number  of  mechanics  used  In 
the  base  case  was  40  and  this  number  was  gradually  reduced  In  subsequent 
runs  as  discussed  in  the  portion  of  the  paper  that  describes  the  experiment. 
Ground  Support  Equipment,  another  parameter  used  in  the  study,  was  also 
generalized  for  the  same  reasons  as  applied  to  the  manpower.  In  the  base 
case,  15  pieces  of  GSE  were  provided  and  this  number,  also,  was  subsequently 
reduced  during  the  experiment. 

The  third  parameter  used  In  the  study  was  the  probability  of  spare 
parts  being  available  when  required  for  remove- and-repl ace  actions.  1 00% 
availability  was  used  In  the  base  case  and  the  percentage  was  reduced  In 
subsequent  runs  of  the  model.  Another  variable  that  Impacts  the  sensitivity 
of  parts  availability  Is  the  period  of  time  chosen  for  the  delivery  of 
unavailable  parts  once  they  have  been  ordered.  A time  period  of  72  hours 
was  chosen  as  a constant  (no  distribution  function)  for  the  resupply  time 
when  parts  probabilities  were  less  than  100%  In  the  experimental  model  runs. 
Any  user  of  the  data  reported  In  this  paper  must  recognize  that  the  assump- 
tion of  a 72  hour  supply  time  has  a significant  Impact  on  the  sensitivity  of 
the  results  relative  to  the  spares  parameter.  For  example,  a resupply  time 
approaching  zero  hours  with  a 50%  probability  of  spares  availability  would 
have  the  effect  of  providing  almost  100%  spares  within  a few  minutes  of  the 
time  they  were  requested.  The  mission  ch  for  the  CH-47C  helicopter  Is 
called  the  resupply  mission  In  which  the  Helicopter  Is  carrying  external 


I 


; I 

\ 

v 


.= 

,1 

il 

i 

I 

! 


1 1 
I 

' I 

I 

I 


| 


i 

1 


loads  of  munitions  to  a forward  gun  site.  The  following  segments  and 
elapsed  times  were  used: 


Ground  prefllght  and  engine  start  and  taxi  30  Min 
Flight  90  Min 
Post  flight,  taxi  and  park  30  Min 
Refuel  30  Min 


In  simulation  modeling  In  general  and  especially  when  an  attempt  Is 
being  made  to  quantify  the  optimum  quantity  of  logistic  support  resources 
necessary  to  obtain  a desired  effectiveness,  It  Is  essential  that  the 
number  of  aircraft  requested  significantly  exceed  the  maximum  capability 
of  the  system;  In  simulation  parlance  this  Is  called  "Loading  the  System." 
The  following  mission  schedule  was  requested.  7 days  a week,  for  4 weeks: 

Take-Off  Time  Max  Number  of  Aircraft  Min 


0700  4 

0830  4 

1000  4 

1130  4 

1300  4 

1430  4 

1600  4 

1730  4 

1900  4 


The  Max/Ml n numbers  are  to  be  Interpreted  as  Max  - the  desired  number 
of  aircraft  per  mission  and  Min  ■ the  minimum  number  of  aircraft  that  will 
be  permitted  to  fly  the  mission.  From  this  data,  1008  launches  are 
scheduled  per  a 28  day  month  to  fly  252  missions.  In  the  base  case,  753 
aircraft  launches  were  accomplished  and  all  of  the  252  scheduled  missions 
were  flown  with  at  least  one  aircraft  on  the  mission.  74.755  of  the 
scheduled  launches  were  met. 

*In  simulation  modeling  It  is  necessary  to  provide  a simulated  period 
of  stabilization  running  prior  to  the  start  of  the  data  collection  period. 

The  stabilization  period  Is  sized  to  assure  that  those  functions  and  Inter- 
actions which  occur  during  the  simulation  become  stabilized  before  final 
statistics  are  collected.  That  Is,  people  are  working  and  being  demanded, 
parts  are  being  used  and  ordered,  delays  are  occurring  for  lack  of  resources, 
etc.  To  speed  the  stabilization,  an  Initial  quantity  of  flight  hours  is 
distributed  across  the  aircraft  fleet  and  time  scheduled  removal  components. 
All  runs  consist  of  a 2 week  stabilization  period. 
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Remembering  that  the  ARMS  model  Is  stochastic  and  that  probability 
distributions  are  widely  used  In  the  Internal  decision  process,  It  Is  to 
be  understood  that  any  one  replication  represents  only  one  realization 
In  a distribution  of  possible  outcomes.  Therefore,  for  statistical 
validity  as  well  as  for  parameter  smoothing,  replications  of  the  runs  at 
each  data  point  using  different  random  number  streams  are  reoulred.  The 
number  of  replications  required  to  achieve  statistical  confidence  will 
vary  with  the  scheduled  activity  within  the  simulated  scenario  and  also 
with  the  length  of  the  simulation  period.  For  the  data  used  In  this 
report,  10  replications  were  made  at  each  data  point  and  a 28  day  simulation 

ferlod  was  used.  A mathematical  average  was  made  of  the  replicated  values, 
rom  this  data,  trend  lines  were  computed  for  each  test  parameter  using  a 
second  degree  polynomial  regression  program.  Another  parameter  that  could 
have  been  used  In  this  study  would  be  the  number  of  aircraft  In  the  fleet. 
For  this  study  the  number  of  CH-47C  aircraft  remained  constant  at  16. 

Having  achieved  our  baseline  point,  we  began  running  the  cases  to 
show  the  effects  of  varying  the  support  resources.  Holding  GSE  and  Parts 
Availability  constant,  we  made  simulation  runs  decreasing  Manpower  by  10%, 
20%,  30%,  and  40%  from  the  baseline  point.  The  aircraft  launched  showed  an 
Irrmedlate  Impact  with  a 10%  decrease  In  manpower  causing  a .5%  decrease  In 
our  parameter.  As  the  Manpower  decreased,  Its  effects  rapidly  Increased 
until  the  30%-40%  decrease  caused  a jump  from  8%  less  aircraft  launched  to 
18%  less  aircraft  launched.  This  Indicates  that  any  further  decrease  In 
available  Manpower  would  severely  limit  the  capability  of  the  CH-47C  to 
perform  Its  missions. 

The  Ground  Support  Equipment  simulation  cases  were  handled  the  same 
way.  While  holding  Manpower  and  Parts  Availability  constant,  the  GSE  was 
reduced  10%,  20%,  ...  70%.  The  graph  of  the  results  was  generally  the 
same,  however,  GSE  had  a more  gradual  Initial  Impact  than  Manpower  showed. 
GSE  did  not  have  an  accelerating  effect  until  It  had  been  decreased  approxi- 
mately 50%  of  Its  Initial  strength. 

Parts  Availability  was  decreased  90%,  80%,  ...  50%  while  Manpower  and 
GSE  were  held  constant.  However,  the  effects  of  Parts  Availability  did  not 
follow  the  same  general  slope  as  Manpower  and  GSE.  Its  shape  Is  more  a 
straight  line  than  curved.  The  effects  of  Parts  Availability  apparently 
are  linear,  at  least  through  a large  reduction  In  the  Parts  Availability. 
This  may  be  due  to  the  large  number  of  variables  Involved  In  this  area,  such 
as  Inventory  restock  delay  time  (3  days  for  this  paper)  or  probability  of 
remove  and  replace. 

After  viewing  the  results  curves  from  our  experiment,  the  analyst  can 
give  answers  to  the  question  of  how  a reduction  In  support  resources  can 
affect  his  particular  system.  However,  the  analyst  must  realize  that  In 
any  given  situation,  he  must  do  more  than  was  done  In  this  paper.  We  did 
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our  experiment  with  certain  simplifying  assumptions  and  certain  variables 
as  constants.  The  analyst  should  review  these  and  see  If  they  are  adequate 
for  his  particular  situation.  If  they  are  not*  he  should  change  them  so 
they  are  applicable.  He  must  also  take  Into  consideration  what  the  decision 
maker  wants „ For  Instance  if  It  Is  how  to  reduce  the  cost  of  resources 
while  causing  the  least  Impact  on  the  system*  the  analyst  must  calculate  the 
total  cost  Involved  In  reducing  resources.  For  Instance,  he  may  be  able  to 
reduce  a high  cost  resource  has  a large  Impact  on  his  system  and  compen- 
sate for  the  decrease  with  an  increase  of  another  resource  and  still  save  the 
necessary  dollars  as  opposed  to  just  reducing  the  resource  with  the  least 
Impact  on  his  system,  hoping  he  will  get  the  necessary  cost  saving,  which  may 
not  happen.  Also,  the  analyst  may  find  that  for  reasons  beyond  his  control, 
he  may  not  be  able  to  reduce  the  resources  that  his  analysis  tells  him  should 
be  reduced. 

All  the  above  situations  are  just  reasons  why  the  question  of  "resources 
Impacting  availability"  Is  not  an  easy  one.  However,  In  answering  these 
questions,  the  analyst  does  have  a tool  that  will  help  him  do  his  job,  that 
Is  the  ARMS  model.  If  used  correctly.  It  can  be  a great  help. 
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CH-47C  SIMULATION  MODEL  SCENARIO-BASELItt 
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BASELINE  CH-47  MODa  PARAMETERS 


CH-47C 


CH-47C 

BASELINE  PARAMETERS 


Total  Calls  MA's  Waiting 


RESOURCES  (BASELINE) 


USE  OF  LOGNORMAL  CONFIDENCE  BOUNDS  ON  RELIABLE  LIFE 
WHEN  THE  TRUE  LIFE  DISTRIBUTION  IS  NOT  LOGNORMAL 


Eugene  E.  Coppola 
Bcnet  Weapons  Laboratory 
Watervliet  Arsenal 
Watervliet,  NY  12189 


1 . Reliable  Life  and  Its  Lower  Confidence  Bound 

Reliable  life  is  that  time  S during  which  a specified  proportion  R of 
a population  of  devices  will  operate  continuously  without  failure.  The 
proportion  R is  called  the  reliability.  Reliable  life  is  especially  | 

Important  for  devices  which  can  fail  catastrophicly;  that  is,  failure  of 
the  device  can  result  in  the  destruction  of  the  device  and  possibly  surround-  j 

ing  equipment  and  also  possible  injury  or  death  to  operating  personnel. 

Cannon  components  such  as  tubes  and  breeches  fall  into  this  category.  For  ! 

such  catastrophicly-failing  devices,  it  is  important  that  the  device  be 

operated  only  during  the  time  for  which  the  probability  of  successful 

operation  (R)  is  high.  For  cannon  components,  R is  generally  specified  to 

be  0.999.  The  reliable  life  for  cannon  components  is  also  known  as  safe 

life,  and  we  will  use  the  two  terms  interchangeably. 

For  a new  device,  reliable  life  is  not  known  and  must  be  estimated  from 
test  data.  For  cannon  components  a confidence  requirement  is  added.  That 
is,  it  must  be  shown  with  a specified  confidence  level  C that  the  actual  ! 

reliable  life  exceeds  a given  value.  For  cannon  components,  C is  generally 
specified  as  0.9.  In  practice,  because  of  the  confidence  requirement,  point 
estimates  of  safe  life  are  not  used;  instead  a lower  confidence  bound  on 
safe  life  at  level  C is  used.  The  lower  confidence  bound  will  be  called 
lower  corifidenced  safe  life  (LCSL). 

For  cannon  components,  catastrophic  failures  are  caused  by  fatigue 
cracks.  Consequently,  safe  life  is  important  only  for  fatigue  failures. 

There  are  other  ways  that  cannon  components  can  fail  (e.g.,  excessive  wear  j 

in  tubes)  but  these  are  fail-safe  types  of  failure  and  hence  are  ignored  in  I 

safe  life  determination.  Fatigue  testing,  even  with  the  laboratory  simu-  j 

lation  techniques  employed  today,  is  very  expensive  and  time  consuming. 

This  greatly  limits  the  amount  of  date  that  can  be  collected  for  any  one  '■ 

type  of  device.  The  generally  accepted  method  today  is  to  test  six  spec- 
imens to  failure  and  to  base  safe  life  calculations  on  these. 

Because  data  is  limited  and  the  specified  reliability  is  so  high,  t 

non-parametric  and  distribution-free  methods  do  not  give  good  results. 

Consequently,  it  is  necessary  to  assume  that  the  failure  times  follow  a 
distribution  of  known  msthematical  form.  The  lognormal  and  Keibull  distri- 
butions are  commonly  used  for  this  purpose,  although  there  has  never  been 
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enough  data  from  any  one  particular  device  to  make  a determination  of  the 
true  failure  distribution.  There  are  some  theoretical  justifications  behind  I 

both  the  lognormal  and  h'eibull  distributions,  but  we  shall  not  consider  them  J 

here.  • ; 

i i 

2 . Fracture  Mechanics  Model  of  Gun  Tube  Fatigue 

Using  Paris*  equation  for  rate  of  crack  growth  and  experimental  results,  t 

Throop  and  others  at  Katervliet  Arsenal  have  developed  a deterministic  model 
of  fatigue  crack  growth  in  gun  tubes  [5,7],  After  some  manipulation  of  ! 

Throop* s equations  [S],  the  following  equation  relating  crack  depth  b to  [ 

number  of  cycles  N results: 

N - gjj-  (b0  k-b  N CD  | 


| 

! 


i , 

i 


if 


t 

f 

r 


f 


where:  bc 


G 

k 


the  initial  crack  depth,  assumed  to  be  present  after  a few 
rounds  of  firing 

2k+l 

C CaS/iTl 
E0yKIC 

a parameter  dependent  on  material  properties  and  stress 
intensity 


S “ maximum  hoop  stress  at  the  bore 

a * a parameter  depending  on  crack  shape  and  on  the  residual 
stresses  introduced  by  the  autofrettage  process 

E « Young's  modulus 


Oy  ■ yield  strength 

^IC  * fracture  toughness  for  a crack  in  a tangential  stress  field 

C ■ a parameter  varying  with  k to  maintain  dimensional  homogeneity 
and  possibly  depending  on  material  properties 


From  equation  1,  one  can  calculate  the  number  of  cycles  N required  for 
the  crack  to  reach  a critical  depth  at  which  fatigue  failure  occurs,  provided 
one  knows  the  relevant  material  properties.  The  material  properties,  how- 
ever, vary  from  tube  to  tube,  that  is,  they  are  random. 


3.  Computer  Simulation  of  Fatigue  Failure 

Using  Throop’ s model,  Racicot  [5]  performed  Monte-Carlo  simulations  to 
generate  a large  number  of  pseudo-fatigue  lives  that  could  then  be  analyzed 
statistically.  However,  there  is  not  sufficient  data  at  this  time  to  deter- 
mine the  distributions  of  the  material  properties  (b0,  k,  S,  a,  E,  Oy,  Kjq, 
nr.d  C)  that  appear  in  Throop ’s  model.  F.scicot  therefore  assumed  that  each 
of  the  material  properties  had  the  sane  type  of  distribution  and  that  this 
type  of  distribution  was  either  normal,  lognormal  or  V.'eibull.  The  parameters 
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of  the  assumed  material-properties  distributions  were  estimated  from  exper- 
imental data.  The  present  author  [2]  has  extended  Racicot's  results  by 
considering  more  general  cases.  For  each  run  of  the  author's  simulation 
program,  the  program  was  instructed  to  pick  at  random  a distribution  type 
for  each  of  the  material  properties.  The  parameters  of  the  material-prop- 
erty distributions  were  then  estimated  from  experimental  data.  This  method 
would  hopefully  allow  some  independence  from  unwarranted  assumptions.  The 
present  author  has  also  considered  cases  where  the  material  properties  are 
correlated;  Racicot  assumed  that  most  of  the  material  properties  were 
statistically  independent.  In  this  manner,  we  obtained  several  sets  of 
simulated  fatigue  data,  each  of  which  could  be  examined  statistically. 

One  question  of  great  interest  was  whether  the  simulated  data  could  be 
described  by  various  parametric  distribution  families.  This  problem  was 
approached  in  the  standard  way.  For  each  parametric  family,  the  parameters 
were  estimated  from  the  simulated  data  to  obtain  an  approximating  distri- 
bution. The  approximating  distribution  could  then  bo  compared  to  the 
simulated  data  by  goodness-of-fit  statistics.  We  used  three  goodness-of- 
fit  statistics:  Kolmogov- Smirnov  (KS),  Cramer-von  Mises  (CVM)  and  Anderson- 
Darling  (AD).  (See  reference  6 for  definitions  and  uses  of  these.) 

As  an  example  of  the  sort  of  results  obtained.  Figure  1 shows  the 
frequency  histogram  for  the  data  of  Run  #1,  consisting  of  10,000  simulated 
fatigue  lives.  Figure  2 shows  the  empirical  cumulative  distribution 
function  (cdf)  of  the  simulated  data,  along  with  the  approximating  distri- 
butions from  several  parametric  distribution  families.  None  of  them  really 
gives  a good  fit.  In  Table  1 we  show  the  goodness-of-fit  statistics 
calculated  for  several  distribution  families.  The  lognormal  distribution 
gives  the  best  fit  (the  smaller  the  goodness-of-fit  statistic,  the  better 
the  fit).  The  Bimbaum-Saunders  runs  a close  second.  Keibull  and  exponential 
distributions  do  not  fit  nearly  as  well. 


These  results  were  typical  for  the  simulated  data;  the  lognormal  or  the 
Bimbaum-Saunders  gave  the  best  fit.  They  were  quite  close  together  and 
generally  did  much  better  than  the  other  distributions.  In  his  studies, 
Racicot  concluded  that  the  lognormal  gave  the  best  fit  (be  did  not  consider 
the  Bimbaum-Saunders)  and  recommended  that  the  lognormal  distribution  be 
used  in  the  future  for  fatigue  life  studies.  The  only  problem  is  that  the 
present  author  has  shown  that  although  the  lognormal  distribution  usually 
does  give  better  fits,  the  fit  is  not  totally  acceptable.  In  fact,  the 
goodness-of-fit  statistics  in  most  cases  were  significantly  too  large,  thus 
leading  to  a rejection  of  lognormal ity.  It  then  becomes  important  to  know 
how  well  procedures  derived  from  the  assumption  of  lognormality  work  even 
though  the  fatigoie  life  distribution  is  probably  not  lognormal. 

4.  Eirnbaum- Saunders  vs.  Lognormal 

Before  we  consider  the  adequacy  of  the  lognormal,  we  should  explain  why 
we  are  not  performing  a similar  analysis  for  the  Bimbaum-Saunders  distri- 
bution, even  though  the  Birnbnu.7.- Saunders  .vr.d  the  lognormal  fit  about 
equally  veil.  Actually,  the  closeness  of  the  F 1 r nb *=  *. :r. -Founders  vnd  the 


327 


i 

l 

i 


! 


I 

! 


The  cdf  of  the  Birnbnum- 


lognormal  was  anticipated,  on  theoretical  grounds. 
Saunders  distribution  is  given  by: 


Fl(x;a,B)  * 

x - x. 

where  ♦(x)  ■ / e*y  /zdy 

/2tt 


< <*>*■  - ; (f)'^ 


0 


x > 0 
x < 0 


is  the  standard  normal  cdf.  a and  0 are  positive  unknown  parameters. 
The  lognormal  cdf  is  given  by: 


F2(x;a,e)  » 


Inf*)] 

a e 

o 


X > 0 
x < 0 


where  o and  0 are  positive  unknown  parameters. 

Now  suppose  that  X is  a r'andom  variable  with  cdf  Fj(  ;a,0).  Let  Y * 
(X/B^/a.  The  cdf  of  Y is  easily  seen  to  be: 

vg/ 2 _ v“q/2 

G(y;a,B)  a 4>(^. — 0 for  y > 0 

and  0 for  y < 0.  Now  let  a approach  0.  For  any  y > 0, 


lio  ya/2  . y-ct/2 

a+0  • 

a 


In  y . 


Consequently,  for  all  y. 


lira  G(y;a,B)  » GCln  y)  , 

a+0 

which  is  the  standard  lognormal  distribution. 

The  above  suggests  that  for  small  a,  the  Bimbaum- Saunders  distribution 
Fj(x;a,B)  can  be  approximated  by  the  lognormal  distribution  F2(x;a,8).  The 
opposite  is  also  true:  For  small  a,  the  lognormal  F2(x;o,0)  can  be  approx- 
imated by  the  Bimbaur.-Saunders  Fi(x;c,8).  The  difference  Fj(x;a,l)  - 
F2(x;a,l)  is  shown  in  Figures  3 and  4.  The  approximation  is  quite  good 
for  small  a.  In  fact,  data  from  gun  tube  fatigue  tests  suggests  that  a 


will  usually  bo  small  (less  than  0,3).  We  also  observed  small  a for  the 
simulation  data.  In  practice,  therefore,  the  Birnbaum-Saunders  and  the 
lognormal  will  be  so  close  as  to  be  nearly  interchangeable.  However,  the 
lognormal  is  much  easier  to  deal  with  in  practice.  So  we  have  chosen  to 
ignore  the  Birnbaum-Saundors  even  though  it  fits  about  as  well  as  the 
lognormal . 

5 . Method s of  Confldcncing  Safo  Life 

In  the  past,  there  have  been  3 main  schemes  for  calculating  LCSL 
used  at  Watorvliot  Arsenal,  The  first  assumes  the  underlying  failure 
distribution  is  lognormal;  the  other  two  assume  that  tho  underlying 
distribution  is  Weibull.  In  the  following  we  assume  that  Xj, . . . , are 
i Jentical ly'  distributed,  independent  fatigue  lives  obtained  from  testing. 

Method  I:  Lognormal  MLE  Method 

This  method  is  based  on  tho  maximum  likelihood  estimates  (MLE's)  of 
tho  lognormal  distribution.  (See  Ref.  4,  p.  264-268  for  a fuller 
exposition  of  this  method.)  Let: 

- 1 N 
y * if  l In 
1*1 

N 

el  a -L.  1 (In  XL  - y)2 

N-l  i*! 

•m 

Tho  LCSL  Sj  is  given  by: 

Sj  » exp(y  - Kn(R,C)3) 

where  KN(R,C)  is  a tolerance  factor  dependent  on  R,C  and  N. 

Method  II:  Weibull  BL1E  Method 

This  method  is  based  on  the  Beit  Linear  Invariant  Estimators  (BLIE’s) 
of  the  extreme-value  distribution.  For  this  method,  the  underlying  fatigue 
life  distribution  is  assumed  to  be  2-parameter  Weibull.  The  extreme-value 
distribution  enters  the  picture  because  the  logarithm  of  a random  variable 
with  a 2-parameter  Weibull  distribution  has  an  extreme-value  distribution. 
(See  Mann,  Schafer,  and  Singpurwalla  [4]  for  a fuller  exposition  of  this 
method.)  One  calculates  two  numbers  ?j  and  £ which  are  the  BLIE's  of 
extreme  value  location  and  scale  and  are  basically  just  weighted  sums  of 
tho  logarithms  of  the  failure  times,  the  weights  depending  on  sample  site 
N.  The  LCSL  is  given  by: 

Sn  » exp(n  - LN(R,CK) 
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whore  Ln(R,C)  is  a toloranco  factor  (not  the  same  as  the  tolerance  factor 
in  Method  I)  depending  on  R,C  and  N. 

j While  Mothod  I and  Method  II  are  superficially  similar,  it  has  been 

found  in  practice  that  Method  II  will  generally  give  a much  smallor  LCSL 
than  Method  I.„  The^author  conjectures  that  some  general  law  is  at  work  ■ | 

j that  requires  Sj  > Sjj  with  high  probability  but  he  has  not  been  able  to 
show  this. 

: * The  third  method  involves  a Bayesian  scheme  devised  by  Clarke  [1], 

I This  method  involves  much  laborious  computation  and  wo  will  not  consider 
it  here.  Most  ofton,  this  Bayesian  method  gives  an  LCSL  intermediate 
I in  value  botweon  §n  and  Sj.  j 

| 6.  Adequacy  of  Methods  of  Confidencing  Safa  Life  | 

Because  of  the  random  method  of  selection  of  material-property  distri-  | 

bution,  each  run  of  the  simulation  program  effectively  establishes  a j 

possible  fatigue  life  population  from  which  we  can  select  random  samples, 
i Each  population  has  its  ovm  true  safe  life,  which  can  be  estimated  fairly  I 

well.  We  can  then  perform  simulation  studies  for  each  population  to  { 

determine  how  well  the  methods  given  above  for  constructing  LCSL  actually  | 

work.  j 

\ \ 

The  most  important  property  of  a lower  confidence  bound  is  that  it 
j underestimates  the  true  quantity  with  a given  probability  C,  the  con-  1 

fidence  level.  Both  Methods  I and  II  are  derived  from  assumptions  on 
■the  underlying  failure  distribution.  Let  us  call  the  "nominal  confidence 
’ level"  the  confidence  level  C one  would  have  if  the  appropriate  assumptions 
were  true,  and  the  "actual  confidence  level"  the  probability  Cj^  that  tha 
! method  in  question  produces  on  LCSL  less  than  the  true  safe  life.  If  the 
assumption  from  which  the  method  is  derived  is  true,  then  C*  ■ C.  As 
mentioned  above,  the  assumption  of  an  underlying  lognormal  or  Weibull 
distribution  is  probably  not  true.  So  we  will  probably  have  i C.  If 
Ca  > C,  the  method  gives  conservative  bounds,  that  is,  we  are  actually 
underestimating  more  often  than  we  think  we  are.  Because  we  are  dealing 
with  devices  that  can  fail  catastrophicly,  a conservative  method  is  more 
to  be  desired  than  a non- conservative  one. 

a 

The  lognormal  MLE  and  Weibull  BL1E  methods  (I  and  II)  give  conservative 
j bounds  for  all  runs.  The  actual  confidence  levels  C^  were  estimated  from 
1000  simulated  replicates  of  samples  for  various  R and  N and  for  nominal 
confidence  C ■ 0.9.  Some  results  are  shown  in  Figures  S through  8.  The 
estimated  true  confidence  levels  are  of  course  random  variables,  which 
i accounts  for  the  jaggedness  of  the  curves  in  these  figures.  However, 

; the  main  point  here  Is  not  so  much  to  determine  the  true  confidence  level 

j but  to  determine  if  C/^  > C.  For  all  of  our  Ci’s,  except  for  a few  in  Run 

f #2  with  R - 0.9,  we  do  indeed  have  > C with  a 90%  confidence.  We  can 

» therefore  conclude  that  the  lognormal  MLE  and  Weibull  BLIB  methods  do  give 

i conservative  confidence  bounds. 
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An  additional  interesting  fact  emerges  from  these  graphs.  It  appears 
that  while  both  methods  are  conservative,  the  Weibull  BLIE  method  is  more 
conservative  (that  is,  it  gives  a larger  C/0  than  the  lognormal  MLE  method. 
This  would  suggest  that  the  lognormal  MLE  method  is  to  be  preferred  to  the 
Weibull  BLIB  method. 

7.  Conclusions 


The  lognormal  distribution,  while  generally  yielding  better  fits  to 
‘the  simulated  fatigue  data  than  the  other  distributions  considered,  is 
probably  not  the  exact  fatigue  life  distribution.  Methods  derived  from 
the  lognormal  are  generally  conservative  and  can  be  used.  However,  the 
lognormal  may  be  overly  conservative  for  large  reliabilities  and  better 
methods  probably  exist.  Methods  derived  from  the  Weibull  distribution  are 
extremely  conservative  for  large  reliabilities  and  should  be  avoided. 
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Table  1 

Ceodncss-of-Fit  Results 
Run  31 


Distribution 

Type 

Parameters 

J‘9SL?  tjort  . 

Scale. 

Goodncss-of-Fit 
KS  CVM(xlQl)_ 

Statistics 
AD 

Noxtn*l 

11737 

- 

2570 

0.072 

16.6 

10.6 

Lognormal 

4.594 

1 i 463 

0.051 

9.5 

6.2 

Extrcno-Value 

12894 

■ - 

2004 

0.13 

61.9 

48.9 

Woibull  .. 

- 

5.893 

12642 

0.10 

38.7 

29.5 

Exponential 

1 -parameter 

• 

m 

11738 

0.47 

S94.7  . 

282.5 

Exponential 

2 -parameter 

5630 

■* 

6108 

0.30 

287.  S 

149.9 

Double- 

Exponential 

10S81 

- 

2004 

0.063 

16. S 

10.8 

Inverse- 

Kcibull 

- 

5. 893 

10393 

i 

0.037 

28.0 

20.2 

Rivnbaue- 
Saunder  s 

.. 

0.2178 

11466 

, 0.052 

9.6 

6.2 
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Confidenced  Safe  Life,  Lognormal  MLE  Method 
Nominal  Confidence  90% 


FIGURE  6 
lun  »2 

Coni ldenced  Safe  Life.  Lofitoraal  MLF  Method 
NotiMl  Confidence  90% 
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DOUBLE  TESTING  IN  BINOMIAL  DATA 
G.  R.  Andersen 

Battlefield  Systems  Integration,  HQ  DARCOM 
Alexandria,  Virginia  22333 


ABSTRACT: 


Suppose  that  a sample  w-j,  w2 of  size  N Is  drawn  at  random 

from  some  Infinite  population.  Each  element  of  this  sample  Is  to  be 
classified  as  defective  or  non-defective  according  to  one  or  more  tests. 

To  be  specific,  denote  by  TQ  a (preliminary)  test  which,  although  It 
classifies  each  element  of  the  sample  as  defective  or  non-defective  It  may 
do  so  Incorrectly.  Denote  by  T-|  a (primary)  test  which  also  classified 
members  of  the  sample,  but  does  so  without  error.  TQ  is  often  called 
a fallible  test,  while  T^  Is  called  an  infallible  test.  This  paper 
discusses  some  aspects  of  the  problem  of  estimating  the  probability  p, 
that  an  element  of  the  population  Is  non-defective,  on  the  basis  of  the 
sample  w-|,  . . .,  wN,  when  all  the  members  of  this  sample  are  subjected 
to  the  TQ  -test,  but  only  a subsample  of  size  n (n  < N)  Is  tested  accordlny 
to  T.j.  This  problem  has  been  referred  to  In  the  literature  (e.g.,  Tenenbeln.^ j) 
as  "estimating  from  Binomial  data  with  mlsclasslflcatlons." 

For  convenience,  we  will  Identify  the  symbols  TQ  and  T-j , representing 
the  tests,  with  numerical  valued  functions  which  assign  the  value  o to  a 
defective  and  the  value  1 to  a non-defective  sample  Item. 

This  paper  will  only  bo  concerned  with  those  tests  TQ  which  are  necessary 
for  Tj,  In  the  sense  that  T0(w^)  - 0 Implies  with  probability  one  that 


(1)  Tenenbeln,  A.,  ,:A  Double  Sampling  Scheme  for  Estimating  from  Binomial 
Data  with  Mlsclasslflcatlons",  Journal  of  the  Amer.  Statist.  Assoc., 

Sept  1970,  Vol . 65 
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Tj(w.j)  a 0.  That  Is,  passing  the  Tfl  test  Is  a necessary  condition  for 
passing  the  test.  Examples  of  such  tests  are  numerous;  they  are  sometimes 
thought  of  as  screening  tests.  In  the  field  of  reliability,  think  of 
attempting  to  judge  the  reliability  of  Items  In  a stockpile  by  applying 
a cheap  (nondestructive)  test  to  a large  sample  followed  by  an  expensive 
test  applied  to  some  of  the  Items  which  pass  the  first  test. 

The  difference  then  between  this  problem  and  the  one  studied  In  the 
Tenenbeln  paper  Is  that  here  the  size  of  the  second  sample,  the  subsample, 

Is  random.  This  Is  because  here  the  subsample  is  drawn  from  those  members 
of  the  original  sample  which  pass  the  TQ  -test;  whereas,  In  Tenenbeln's 
paper  the  size  of  the  subsample  does  not  depend  on  the  number  of  members 
In  the  Initial  sample  which  pass  the  fallible  test. 

Of  course.  If  every  sample  member  that  passed  a (necessary  ) T0  -test 
was  subjected  to  the  Tj  -test,  then  the  appropriate  p would  be  the  classical 
estimate.  In  the  application  that  prompted  this  study  both  the  fallible 
Tq  -test  and  the  Infallible  T^  -test  were  costly.  Therefore,  long  before 
the  test  was  run,  the  initial  sample  size  N for  the  TQ  -test  and  a nonrandom 
subsample  size,  v,  for  the  T1  -test  had  to  be  specified.  Hence,  the 
classical  estimate  of  p would  result  only  If,  by  chance,  SN,  the  number 
of  Tq  -successes,  did  not  exceed  v.  However,  the  size  of  the  subsample, 

In  general,  could  only  be  stated  to  be  n ■ minimum  (S^,  v).  Therefore, 
the  need  arose  to  find  a way  of  judging  which  values  of  N and  v to  choose. 

As  usual  certain  ''precislon-ln-estimation"  statements  were  required,  so 
the  question  was,  first  of  all,  what  Is  the  best  estimator  of  p In  this 
setup  and,  secondly,  what  should  N and  v be  In  order  to  guarantee  that  a 

t 

;■  certain  level  of  precision  will  be  achieved  In  estimation,  subject  to 

j-  constraints  on  the  costs  of  testing, 
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1.  SUMMARY  OF  RESULTS:  A precise  statement  of  the  problem  considered 

here  Is  given  in  Section  2*and  the  maximum  likelihood  estimation,  pNv,  \ 

i 

of  p together  with  some  exact  distribution  results  are  given  In  Section  3. 

i 

The  relationship  between  the  results  of  this  note  and  those  In  A.  Tenenbeln's 
paper  (1)  Is  explained  In  Section  3,  Remark  3.4,  where  the  exact  and 

a 1 

asymptotic  variance  of  pNv  Is  presented.  (The  exact  variance  Is  not 

i 

obtained  In  the  problem  considered  In  (1)  ).  Basically,  In  the  context 
of  Tenenbeln's  work,  this  amounts  to  showing  how  much  the  asymptotic 

A 

variance  of  p^v  Is  reduced  when  the  preliminary  TQ  Is  necessary  for  T^ 

(and  so  can  mlsclasslfy  In  only  one  direction  as  opposed  to  both  directions 

A 

as  In  (1)  );  this  reduction  In  the  variance  of  pNv  cannot  be  obtained  from 

Tenenbeln.  The  asymptotic  properties  of  the  estimator  and  an  associated 

statistic  are  derived  In  Section  4.  Both  random  and  nonrandomly  standardized  j 

* I 

forms  of  the  central  limit  theorem  are  given  for  p^y  and  the  statistic  j 

giving  the  exact  number  of  successes  In  the  second  sample  of  size  min  (SN,  v). 

Approximate  confidence  Intervals  for  p are  derived  In  Section  5. 

Realizations  of  these  confidence  Intervals  have  different  functional  forms 
depending  on  whether  the  observed  number,  S^,  of  TQ  successes  Is  greater  j 

than,  or  less  than,  or  equal  to,  v.  j 

In  Section  6,  the  required  modification  to  A.  Tenenbeln's  (1)  results 
on  sample  size  determination  based  on  cost  and  precision  are  given  for 
necessary  tests. 


*Th1s  article  and  the  others  noted  below  will  appear  In  a paper  which  Is 
being  prepared  for  printing  In  a national  journal. 
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ANALYSIS  OF  CENSORED  SURVIVAL  DATA1 


Norman  Breslow 

University  of  Washington 
Seattle,  WA.  98195 


ABSTRACT 


Recent  developments  in  the  methodology  of  censored  survival 
data  are  briefly  reviewed.  These  include  estimation  of  the  survival 
curve,  non-pArametric  tests  for  the  compariBou  of  r survival  curves, 
tests  for  trend,  and  the  regression  analysis  of  survival  data.  A 
final  section  provides  some  additional  references  to  the  recent 
literature. 


I,  INTRODUCTION 

Censored  survival  data  arise  in  a wide  variety  of  statistical 
investigations.  In  clinical  trials  one  measures  duration  of  re- 
sponse from  start  of  treatment  until  relapse  or  death  due  to  dis- 
ease. Observations  on  response  time  are  censored  for  those  subjects 
still  in  remission  at  the  study's  end,  as  they  are  for  patients 
loat-to-follow  up  during  the  course  of  the  study.  Animal  carcino- 
genesis studies,  such  as  used  by  the  Food  and  Drug  Administration 
to  determine  the  safety  of  food  additives,  provide  another  example 
of  censored  survival  data.  Here  the  endpoint  is  the  age  at  diagnosis 


? 1,  Paper  prepared  for  the  23rd  Conference  on  Design  of  Experiments  in 

I army  research,  development  and  testing  held  at  the  Naval  Postgraduate 

I'  School  in  Monterrey  October,  1977. 


of  a particular  kind  of  cancer,  while  censorship  occurs  because  of 
death  due  to  other  causes  Including  sacrifice.  In  tests  of  the  re- 
liability of  missile  components,  failure  times  are  measured  from  the 
start  of  testing  until  failure  of  the  component,  with  censorship  im- 
posed by  the  failure  of  other  components  or  the  necessity  of  analyz- 
ing the  data  before  all  items  have  failed.  While  all  of  these  types 
of  data  occur  widely  in  practice,  the  presentation  below  emphasises 
the  clinical  trial  since  that  is  the  area  of  application  with  which 
the  author  is  most  familiar. 

Figure  1 illustrates  the  results  for  the  control  group  in  a 
clinical  trial  reported  by  Heyn  et  al  (1974).  This  trial  was  de- 
signed to  investigate  the  effects  of  combined  chemotherapy  as  an 
adjunct  to  surgery  and  radiation  in  the  treatment  of  childhood 
rhabdomyosarcoma.  The  endpoint  for  analysis  was  the  re-appearance 
of  tumor,  whether  at  the  site  of  original  treatment  or  through  dis- 
tant metastasis,  so  that  children  who  remained  disease-free  at  the 
time  the  data  were  analyzed  had  censored  observations.  In  addition 
to  the  control  arm  IA,  there  were  two  groups  of  children  who  received 
the  drugs  actinomycin-D  (ACT-D)  and  vincristine  (VCR) : group  IB  were 
concurrently  randomized  with  the  controls,  both  these  groups  having 
apparently  had  their  tumors  completely  resected;  while  1IA  consisted 
of  patients  with  microscopic  residual  disease  at  the  margin  of  surgi- 
cal resection. 

Interim  data  from  all  three  arms  are  presented  in  Table  1.  Note 
that  the  censored  observations  for  arm  1A,  those  in  the  column 

( 
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labelled  "disease-free",  are  smaller  in  the  table  than  they  are  in 
the  figure.  This  is  because  the  figure  was  drawn  from  data  computed 
at  a later  point  in  time,  when  additional  follow  up  was  available 
for  most  patients  who  had  not  already  died. 

Analysis  of  censored  survival  data  such  as  presented  in  Table  1 
has  several  goals.  First  one  wants  an  estimate  of  the  survival  curve, 
the  ptobabllity  of  surviving  t units  of  time,  for  each  of  the  compari- 
son groups.  Statistical  tests  are  required  next  to  determine  whether  ! 

I 

the  observed  differences  between  the  curves  are  real  or  are  simply  j 

chance  effects.  If  real,  a method  of  quantifying  the  nature  of  the 
differences  is  desirable.  Finally  there  may  be  available  concomitant 
observations,  including  continuous  measurements  such  as  age  at  diag- 
nosis, whose  joint  effects  on  survival  are  important  to  determine, 

2.  ESTIMATION  OF  SURVIVAL  CURVES 
The  first  Btop  In  the  analysis  of  censored  survival  data  is  to 
form  a series  of  2 x r contingency  tables  as  shown  in  Table  2.  One 

I 

table  is  formed  for  each  of  the  K distinct  times  0 ■ trt<t  <t,*,,<t  i 

0 1 2 k I 

at  which  deaths  (or  failure,  relapses,  etc.)  occur.  The  column 
totals  refer  to  the  total  number  of  subjects  in  the  i group 

who  remain  "at  risk",  l.e.  alive  and  under  observation,  just  prior 

to  time  t^.  The  tabular  entries  and  s^  denote  the  numbers  of 

these  who  die  at  t^,  and  survive  t^»  respectively.  Table  3 illus- 

trates the  calculation  of  the  first  three  such  tables  for  the  data 

in  Table  1.  Here  r ■ 3 and  t,  • 2,  t,  • 3 and  t,  *•  9 months.  Note 

1 2 3 
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| that  the  tables  for  increasing  refer  to  a constantly  diminishing 

,1 

j population  "at  risk"  as  additional  subjects  die  or  arc  withdrawn 

i'  . 

(censored)  from  further  observation. 

j 

j Kaplan  and  Meier  (1958)  derived  the  maximum  likelihood  non** 

i 

1 parametric  estimate  of  the  survival  curve  based  on  censored  data. 

| This  may  be  calculated  recursively  from  the  entries  in  the  2 x r 

contingency  tables  shown  in  Table  2.  Starting  from  P(tQ)  - 1,  and 
suppressing  the  group  index  i,  the  recursion  formula  is 

*<tk>  • »<**-!>  <!£  ™ 

i 

| for  k - 1,2,...,K,  In  other  words , the  probability  of  surviving 

i 

past  t^  is  estimated  as  the  probability  of  surviving  past  t^  ^ times 
the  conditional  probability  of  surviving  past  t^  given  survival  to 
. t^.  The  curves  remain  flat  between  failure  times.  Because  of  the 

multiplicative  structure  (1) , Kaplan  and  Meier  refer  to  their  esti- 
mate as  the  product  limit  (PL)  estimate.  In  case  there  is  no  cen- 
sorship in  the  data,  this  reduces  to  the  familiar  empirical 
distribution  function. 

Table  4 shows  the  calculation  of  the  relapse-free  survival 
! curve  from  the  interim  data  in  Table  1 for  treatment  group  IA.  The 

I 

j corresponding  curves  calculated  from  final  Btudy  data  for  all  three 

| treatment  curves  are  shown  in  Figure  2.  Numbers  above  each  curve  at 

i 

j annual  intervals  in  this  figure  refer  to  numbers  of  patients  still 

1 \ 

I 

at  risk  in  each  group.  These  are  an  important  means  of  judging  the 

; « 

i stability  of  the  estimates.  Such  estimates  can  in  fact  be  quite 

I 

| 
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unstable  In  the  tall  of  the  survival  distribution  where  few  subjects 
remain  at  risk. 

A more  formal  method  of  judging  the  steblllty  of  the  PL  estimate 
Is  to  calculate  its  variance.  Kaplan  and  Meier  provided  e variance 
formula  for  their  estimate,  which  may  also  be  exprensed  recursively. 
Starting  from  V{F(t^)}  » 0 this  is  defined  by 

s2  d, 

- v(p<t.  ,))<%  + {0(t.  )>2(~— ).  (2) 

°k 

Breslow  end  Crowley  (1974)  show  that  in  large  samples  P(t)  is  approxi- 
mately normally  distributed  with  mean  equal  to  the  true  survival  func- 
tion P(t)  and  a variance  which  may  be  estimated  from  (2).  Note  that 
neither  P(t)  nor  V{P(t)}  will  change  after  the  last  uncensored  re- 
sponse time  in  each  group,  even  though  additional  subjects  continue 
to  be  withdrawn  from  observation.  In  this  region  the  estimated  vari- 
ance often  does  not  accuratoly  reflect  the  variability  in  the  esti- 
mated survival,  which  will  be  substantial  unless  larga  numbers  remain 
on  study. 


3,  COMPARISON  OP  SURVIVAL  CURVES;  THE  LOC  RANK  TEST 
A very  simple  but  powerful  non-parametric  test  for  the 
comparison  of  r survival  curves  with  censored  data  may  also  be  cal- 
culated from  the  series  of  2 x r contingency  tables  shown  in  Table  2. 
This  test  exploits  the  fact  that,  under  the  null  hypothesis  of  no 
difference  in  the  underlying  survival  distributions  and  conditional 
upon  fixed  values  for  the  marginal  totals  in  the  2 x r table,  the 
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vector  dk  - (d^ d^) ' of  observed  deaths  at  the  t^  has  an 

r-dimensional  hyper geometric  distribution,  Consequently  the  null  ex- 
pectation of  the  number  of  deaths  in  group  i at  t^  is 

A 

Sik^N, 

,th 


eik  " E(dik) 


/X 


i.e.  the  number  at  risk  in  the  i group  times  the  death  rate  for  all 
r groups  combined.  An  illustration  of  this  calculation  is  given  in 
Table  3 for  the  interim  study  data.  The  covariance  matrix  of  d^ 
has,  under  the  null  hypothesis , an  (i,j)  component  equal  to 


r 


IlYkHij 


nik(Nk~nik)DkSk  , i - j 


Nk<Vl> 


_ nikn1kDkSk 
N£<Nk-l> 


,1^3 


The  main  idea  behind  the  test  is  to  sum  up  the  statistics 
calculated  from  each  of  the  K 2 x r tables  into  a vector 


? " Ekdk 


of  observed  numbers  of  deaths  in  each  group,  a vector 


5 ' Vk 


of  expected  numbers  of  deaths,  and  a summary  covariance  matrix 


Y - • 


Since  the  K 2 x r tables  refer  to  overlapping  sets  of  subjects 
they  are  not,  strictly  speaking,  statistically  independent. 


350 


i 


..414.V*..  .* 


Nevertheless  Cox  (1975)  has  shown  that  the  conditional  distributions 
for  the  observation  vectors  d^  may  be  formally  regarded  as  independ- 
ent, so  that  V is  an  appropriate  covariance  matrix  for  0-E.  V is  a 
singular  covariance  matrix  of  dimension  r-1.  This  corresponds  to 
the  fact  that  EO^  ■ EE^  is  the  total  number  of  deaths  observed  in 
all  r groups.  However  by  defining  0*  and  E*  to  be  the  first  r-1 
components  of  0 and  E,  and  by  V*  the  (r-1)  x (r-1)  upper  left  hand 
corner  of  V,  a test  statistic  for  testing  equality  of  the  r aurvivel 
curves  is  obtained  as 

T - (0*-E*)'V*-l(0*-E*). 

J.  a a a * a 

This  is  distributed  as  chi-square  on  r-1  degrees  of  freedom  under 
the  null  hypothesis. 

The  test  T^  was  first  proposed  for  survival  data  by  Mantel 

s 

(1966).  Cox  (1972)  later  derived  it  from  likelihood  theory  under 
the  proportional  hazards  (PH)  model,  in  which  the  instantaneous 
death  rates  in  the  r groups  are  in  constant  ratio  throughout  the 
follow  up  period.  (This  model  is  discussed  furthor  below).  Peto 
and  Peto  (1972),  considering  only  the  case  r ■ 2,  argued  that  it 
was  *i asymptotically  efficient  test  under  Cox's  model  and  dubbed 
it  the  "log  rank"  test. 

A conservative  approximation  to  T^  which  requires  no  matrix 
inversion  la  given  by  the  familiar  chi-square  formula 


While  T <T  , in  fact  the  two  will  be  quite  close  provided  that 

dt 
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there  are  few  ties  among  the  uncenaored  survival  time#  (it a.  moet 
of  the  in  Table  2 are  unity)  and  that  the  pattarna  of  cenaorahlp 
operating  in  the  r groups  are  not  grossly  different.  See  Peto  and 
Pika  (1973)  and  Crowley  and  Breslow  (1975)  for  discussion  of  this 
approximation. 

Table  5 illustrates  the  manner  of  presentation  of  the  summary 
and  teat  statistics  for  the  interim  study  data.  Note  tha  calcula- 
tion of  tha  ratio  0/E  of  observed  to  expected  numbers  of  deaths  in 
each  treatment  group,  These  are  very  useful  as  measures  of  treat- 
ment effect  since  their  ratios,  e.g.  0^/E^  * 02/E2,  estimate  the 
relative  death  rates  in  the  respective  treatment  groups  (Breslow, 
1975). 


4.  ALTERNATE  WEIGHTING  SCHEMES;  THE  GEHAN/BRESLOW  TEST 
Tha  summary  statistics  0-E  weight  the  observed  differences 
d^-e^  in  each  table  in  a manner  which  la  appropriate  to  the  PH  model 
already  mentioned.  However  this  Is  not  the  only  possible  weighting 
scheme.  Multiplying  the  observed  differences  before  summing  by  N^, 
the  total  number  of  subjects  in  the  kC^  table,  gives  more  weight  to 
the  earlier  times  t^  when  larger  numbers  are  at  risk.  This  leads 
to  the  scores 

W1  -J^Vik  - "tkV  • 


covariance  matrix 


-v 


K 0 

■ I 

k-1  K * 
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and  test  statistic 

T3  " 

where  the  asterisks  (*)  denote  the  corresponding  r-1  dimenaional 
quantities.  A conservative  approximation  to  this  statistic  not 
requiring  matrix  inversion  is 


The  scores  may  also  be  obtained  from  a pairwise  comparison 

fch 

of  the  observations  in  the  i treatment  group  with  those  in  the  re- 
maining r-1  groups.  Each  auch  pair  is  assigned  the  value  +1  (or  -1) 
according  as  the  true  survival  time  for  the  first  pair  member  is 
known  to  be  smaller  than  (or  larger  than)  that  for  the  second  mem- 
ber. Ties  or  indeterminate  comparisons  due  to  censorship  are  as- 
signed 0 values.  Gehan  (1965)  suggested  the  use  of  such  scores  for 
the  comparison  of  two  samples  (r-2) , noting  that  the  resulting  test 
essentially  reduced  to  the  familiar  Wllcoxon  rank  sum  test  when 
there  was  no  cenBorship.  Breslow  (1970)  extended  this  work  to  the 
case  of  r>2  samples,  proposing  also  covariance  matrix  and  the 
statistic  Tj.  This  latter  statistic  is  valid  for  situations  where 
the  patterns  of  censorship  operative  in  the  r treatment  groups  are 
unequal,  as  in  animal  carcinogenesis  studies  where  there  is  dif- 
ferential toxic  mortality.  The  conservative  approximation  T^  is 
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strictly  valid  only  where  there  la  equality  of  cenaorshlp. 


■i 


i 


In  practice  the  teats  and  often  yield  rather  similar 
numerical  values.  However  this  ia  not  always  true  and  some  com- 
ments on  the  proper  interpretation  when  only  one  statistic  is  sig- 
nificant are  in  order.  Since  weights  early  valuea  more  heavily, 
it  may  achieve  significance  when  there  is  an  early  separation  be- 
tween the  survival  curves  which  later  coma  together  or  even  cross 
over.  gives  more  weight  to  the  latter  part  of  the  curves,  and 
would  detect  differences  in  the  curves  which  only  appeared  later 
on.  Such  behavior  often  indicates  an  interaction  between  treatment 
and  time  on  the  instantaneous  death  rates,  which  is  worthy  of 
investigation  in  its  own  right. 


i 


5.  TESTING  FOR  TREND 

In  many  situations  the  r treatment  groups  will  correspond  to 
r different  levels  or  dosages  of  some  quantitative  variable  x,  say 
xl<x2<*"<xr'  In  such  cases  the  global  chi-square  tests  and 
are  notoriously  lacking  in  powar.  One  would  prefer  Instead  a sin- 
gle degree  of  freedom  test  for  trend  in  survival  with  incrsaslng 
dose. 


Fortunately,  such  tests  for  trend  are  readily  calculated  from 

the  summary  statistics  already  at  hand.  In  the  case  of  the  0 and  E 

analysis,  one  uses  {x'CO-E)}^ 

T ■ — — 
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as  a single  degree  of  freedom  chi-square  for  a linear  trend  of  0-E 
with  x,  and 


as  a chi-square  on  r-2  degrees  of  freedom  for  deviations  from 
linearity  (Tarona,  1975). 

Similarly,  when  using  the  W scores, 

{x'W}2 
T_  - — 

7 x'V  x 
- -w* 


provides  a test  for  linear  trend  of  these  scores  with  x and 


a test  for  deviations  from  linearity. 


6.  ADJUSTMENT  BY  STRATIFICATION 
When  it  is  thought  that  the  r comparison  groups  may  differ 
with  respect  to  factors  which  influence  survival,  an  adjusted  or 
stratified  analysis  which  corrects  for  the  confounding  effects  of 
such  variables  is  in  order.  Such  an  analysis  is  carried  out  very 
simply,  as  follows. 

First,  divide  the  population  into  strata  which  are  more  or 
less  homogeneous  internally  with  respect  to  the  confounding  variable 
or  variables..  Of  course  there  is  a limitation  on  the  number  of 
confounders  which  may  be  simultaneously  accommodated  in  this  fashion 
since  if  strata  become  too  large  in  number,  and  small  in  size,  a 
large  loss  of  comparative  information  may  result. 
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I 

j 

j Next,  perform  separate  survival  analyses  within  each  stratum, 

[ 

; This  means  calculation  of  the  survival  curves  and  especially  the 

i 

summary  statistics  0,  £,  V,  W and  V defined  earlier.  These  summary 

I *'*'*■■"  “W 

-I 

statistics  are  then  cumulated  by  simple  addition  over  strata. 

Finally,  calculate  the  adjusted  test  statistics  T^,  Tj,  T5» 

' i 

and  Tg  Just  as  before  using  the  cumulated  summary  statistics  0,  E 

h and  V in  place  of  the  stratum  specific  ones.  Likewise  calculate  T^, 

il 

f T^ , and  T^  using  the  adjusted  or  cumulated  W and  V . 

? ■ 

! 7.  ‘ REGRESSION  ANALYSIS  OF  SURVIVAL  DATA:  THE  FH  MODEL 

If  the  number  of  confounding  concomitant  variables  is  vary 
| large,  the  stratified  analysis  approach  quickly  breaks  down  due  to 

| 1 large  numbers  of  strata  with  just  one  or  a few  subjects  in  each. 

Furthermore,  it  may  be  of  interest  to  quantify  the  relationship  be- 
! tween  survival  times  and  concomitant  variables,  some  of  which  may 

be  continuous.  This  situation  calls  out  for  some  kind  of  regression 

[ l 

model. 

A usual  (normal  theory)  regression  approach  would  specify 
that  the  survival  times,  or  come  transform  such  as  their  logarithm, 
were  equal  to  a linear  combination  of  the  concomitant  variables 
, plus  some  random  error  term.  While  not  Impossible,  the  generalisa- 

tion of  such  models  for  use  with  censored  data  may  be  quite  awkward 
and  computationally  involved.  Thus  considerable  interest  was 
aroused  by  Cox  (1972)  when  he  proposed  an  alternative  type  of  re- 
gression model  formulated  in  terms  of  the  effect  of  the  regression 
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variables  on  death  rates  rather  than  times  of  death.  Statistical 
analysis  under  this  model  turned  out  to  be  much  more  tractable  than 
for  those  others  proposed  earlier.  Furthermore,  it  avoided  any 
parametric  assumptions  about  the  shape  of  the  underlying  survival 
curve. 

Cox's  model  is  defined  in  terms  of  the  time  t specific  death 
rate  or  hazard  function  X(t|z)  for  an  individual  having  a p-vector 
of  covariates  z.  Specifically  he  assumes 

X(tjz)  - exp(§'z)l0(t)  , 

where  § is  an  unknown  p-  vector  of  parameters  (regression  coeffi- 
cients) , while  >Q(t)  is  the  unknown  hazard  or  death  race  function 
for  an  individual  with  a standard  (z-0)  set  of  covarlatea.  A con- 
sequence of  this  model  Is  that  the  ratio  of  hazard  functions  for  two 
individuals  with  different  sets  of  covaristes, 

X(t|z1) 

- exp(§-(ll-,2)}. 

does  not  depend  on  time,  whence  *-he  title  proportional  hazards  (PH) 
model. 

Several  authors  (Cox,  1972,  1975;  Kalbfleisch  and  Prentice, 
1973;  Breslow,  1974,  1975)  have  developed  the  likelihood  analysis  of 
the  PH  model  from  rather  distinct  points  of  view.  Providing  that 
there  are  no  tie.,  in  the  uncensored  data,  all  derive  for  the  in- 
likelihood  function  of  B the  expression 
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L(B)  - l {p'V-An  Iaxp(e'*.)}, 
k-1  “ JeR(t.  5 


k-1  * JeR(tk)  J 

where  R(tk)  is  the  risk  set  of  subjects  still  alive  and  under 
observation  at  t^-0;  z^  is  the  covariato  vector  for  the  individual 
who  dies  at  t^;  and  the  outer  summation  is  over  all  K true 
(uncensored)  times  of  death.  In  case  of  ties,  the  three  approaches 
yield  somewhat  different  likelihoods { see  also  Efron  (1977). 

Taking  the  vector  of  first  partial  derivative  of  Lt  setting 
equal  to  0 and  solving  the  resulting  non-linear  equations  yields 

A 

a maximum  likelihood  estimate  B for  the  regression  coefficients. 

A covariance  matrix  for  this  estimate  is  obtained  in  the  usual  fa- 
shion by  inversion  of  the  negative  of  the  matrix  of  second  partial# 


of  L.  The  Integral 


Aq(0  - j Ag(u)du 


defines  the  cumulative  hazard  function  for  the  standard  covariate 

A 

set.  Once  B is  obtained  this  may  be  estimated  by 


MO  “I  i l 

Mt  jeR(t.)  J 


where  the  outer  summation  is  again  over  true  survival  times  t^  less 
than  or  equal  to  t.  The  corresponding  estimate  of  the  survival 


function 


PQ(t)  - exp{-Ag(t) ; 
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is 


V*>  - 


i - 


l exp(§  a.) 


JeR(tk) 


y J 


Notice  that  when  8 * 0 this  reduces  to  the  PL  estimate  of  Kaplan 
and  Meier,  calculated  from  the  entire  set  of  observations  considered 
as  one  homogeneous  sample. 


8.  FURTHER  READING 

Much  of  the  above  material  is  presented  in  greater  detail  in 
my  review  article  (Breslow,  1975)  on  the  PH  model  and  its  applica- 
tions to  survival  data.  Some  additional  applications  of  this  model 
to  epidemiologic  data  are  outlined  in  a forthcoming  paper  (Breslow, 
1978).  Peto,  Pike,  et  aJ  (1976,  1977)  present  a thorough  discussion 
of  the  use  of  the  model  in  the  design  and  analysis  of  clinical 
trials. 

A computer  program  for  calculating  the  PL  estimate  and  all  the 
test  statistics  presented  in  sections  2-5  above  is  available  from 
Thomas,  Breslow  and  Gart  (1977), 

Several  authors  have  pointed  out  that  the  W scores  defined  in 
section  4 do  not  lead  to  the  most  efficient  generalization  of 
Wllcoxon's  test  to  censored  data.  They  all  propose  essentially  the 
same  statistic  as  an  alternate  generalization.  See  Efron  (1965), 
Peto  and  Peto  (1972),  and  Prentice  (1978). 

A comparison  of  the  efficiencies  of  the  test  statistics  using 
Monte  Carlo  techniques  is  made  by  Lee  si.  tl  (1975).  Efron  (1977) 
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discusses  the  efficiency  of  the  in  -likelihood  function  L for  the 
PH  model  from  a more  abstract  viewpoint. 

Extensions  of  the  PH  regression  model  for  ubc  with  grouped  or 

i 

heavily  tied  data  are  discussed  by  Cox  (1972) , Kalbfleisch  and 

I 

Prentice  (1973),  Thompson  (1977)  and  Prentice  and  Gloeckler  (1978).  ! 

i 

I 

i 
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The  above  figure,  as  well  as  the  next  one,  were  first  published  In 
Volume  34:2128-2141,  1974  of  the  journal  CANCER.  They  appeared 
In  an  article  by  Heyn,  R.,  Holland,  R.,  Newton,  W.  A.,  Tefft,  M., 
Breslow,  N. , and  Hartmann,  «].,  entitled  "The  Role  of  Combined 
Chemotherapy  In  the  Treatment  of  Rhabdonyosarcoma  In  Children". 

We  appreciate  the  fact  that  the  editor,  Dr.  J.  E.  Rhoads  of  CANCER 
and  Dr.  Ruth  Heyn  gave  their  permission  to  reproduce  Figures  1 and  2 
In  these  Proceedings. 
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X REMAINING  OISEASE  FREE 


FIGURE  2 

i 


The  duration  of  the  disease-free  Interval  In  patients 
from  Part  IA  (control),  IB  (treated),  and  IIA  (microscopic 
residual,  treated).  Shown  above  each  curve  at  24  and  48 
months  are  the  numbers  of  patients  known  to  be  disease- 
free  after  those  time  periods. 
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TABLE  I 
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TABLE  2 

FORMAT  I ON  OF  2 x r CONTI  NGENCY  TABLES  COMPARING 
DEATH  RATES  AMONG  r TREATMENT  GROUPS  AT  EACH 
DISTINCT  TIME  OF  DEATH 


PATIENTS  FOLLOWED  TO  TIME  tk 


Deaths  (at  t^) 
Survivors 


Treatment  Group 
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TABLE  3 

ILLUSTRATION  OF  2 x r TABLES  FOR 
CCG614  INTERIM  STUDY  DATA 

t • 2 months 


IA 

IB 

IIA 

Relapsed 

1 

0 

0 

01 sease-Fre* 

14 

17 

11 
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15 

17 
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IA 

IB 
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0 

0 
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13 

17 

II 
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14 

17 

11 
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t ■ 9 months 

IA 

IB 
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Relapsed 

1 

1 

0 
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12 

16 

11 

"At  Risk" 

13 

17 

11 
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0.634 
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I 

42 

43 
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I 

41 

42 

1.000 


Total 
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39 

41 
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ESTIMATION  OF  ! 
BY  METHOD  OF 


Month 

*k 

Number 

At  Risk 

nlk 

Number 

Survlvir 

*lk 

2 

15 

IA 

3 

1A 

13 

9 

13 

12 

10 

12 

10 

15 

9 

8 

16 

7 

6 

30 

A 

3 

: A 

. CURVE  FOR  CROUP  ZA 
AND  MEIER  (1958) 


Conditional  Survival 

Probability  Probability 

f(‘k> 


0.933 

0.933 

0.929 

0.866 

0.923 

0.799 

0.833 

0.666 

0.888 

0.592 

0.857 

0.507 

0.750 
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368 


TABLE  5 

SUMMARY  STATISTICS  FOR  CCG6I4  INTERIM  DATA 


Treatment  Group 


IA 

IB 

IIA  ’ 

No.  of  pts.  (N) 

15 

17 

11  1 

Relapses  observed  (0) 

8 

3 

i 

Relapses  expected  (E) 

3.11 

4.99 

3.90 

0/E 

2.57 

0.60 

0.26 

T1  - 11.10,  2 d.f.  , p - 0.004 

T2  • 10.77,  2 d.f.  , p - 0.005 

t3  * 11-^1.  2 d.f.  , p . 0.003 
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1.  Introduction 

The  Jackknife  technique  is  becoming  so  familiar  to  statisticians  that 
it  is  almbst  not  necessary  to  reintroduce  it  with  each  article.  However, 
to  establish  notation  and  to  aid  any  reader  encountering  the  Jackknife 
for  the  first  time,  a brief  definition  is  given. 

Let  Y^,  . Yfl  be  n Independent  random  variables  identically 
distributed  according  to  the  distribution  function  F , which  depends  on 
an  unknown  parameter  6 . The  aim  of  the  statistical  analysis  is  to  es- 
timate or  teBt  8 . The  Jackknife  technique  cun  be  applied  to  any  estima- 
tion procedure  which  for  any  sample  size  gives  a point  estimate 
9(Y1,  Yr)  » 0 of  0 . The  ith  deleted  estimate  is  the  estimate  ob- 

tained by  applying  the  estimation  procedure  to  the  sample  with  the  ith 
random  variable  removed,  i.e., 


*•**  Yi-i*  Yi+l*  ** 


(1) 


Corresponding  to  the  ith  deleted  estimate  is  the  ith  pseudo-value 


■ d0  - (n-l)§  ^ . 


(2) 


The  Jackknifed  estimato  of  0 is  the  average  of  these  pseudo-values 
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i “ l,...,n  as  each  random  variable  is  deleted  in  turn,  i.e., 


0 * i l 0,  - ne  - ia=U  i r 

n “ i n " ~i 


5 * 


(3) 


If  n is  large,  a variation  of  the  jackknife  can  he  invoked  to  save 
on  the  computation  time.  The  modification  is  to  divide  the  total  sample 
Into  g groups  of  size  k each  {n  ■ g*k)  , and  to  successively  delete 
each  one  of  the  groups  rather  than  single  random  variables.  The  ith  de- 
leted estimate  is  now 


®-i  “ ®(Y1*  Y(i-l)k»  Yik+1 V » 


w 


and  the  corresponding  pseudo-value  is 


■ g§  - (g-l)§_1  . 


(5) 


The  Jackknifed  estimate  is  still  the  average  of  the  pBeudo-values , i.e., 


(6) 


Quenouille  (1949,  1956)  introduced  the  Jackknife  sb  a method  of  bias 
reduction,  and  this  aspect  is  surveyed  in  Section  2.  On  the  other  hand, 
Tukey  (1958)  saw  the  Jackknife  as  a device  for  robust  interval  estimation, 
and  developments  along  this  line  are  summarized  in  Section  3.  Robust 
point  estimation  hae  also  been  a rapidly  developing  field  in  recent  years. 
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and  the  connection  between  it  and  the  Jackknife  through  the  influence  func-  j 

tion  is  explored  in  Section  1*.  Application  of  the  Jackknife  to  various 
statistical  problems  is  illustrated  in  Section  5. 

In  my  197*+  review  article,  almost  all  methodological  papers  on  the  | 

Jackknife  published  before  or  during  1973  vere  listed.  The  reader  is  re-  I 

i 

ferred  to  this  earlier  article  for  an  extensive  bibliography  of  papers  from  j 

I 

that  era.  A few  papers  were  missed  (Collins  (1970),  Cronbach  et  al.  (1972),  J 

Hollander  a.id  Wolfe  (1973),  Mosteller  (1971),  and  Pennel  (1972)),  and  these  ! 

( 

are  included  in  the  references  to  this  paper.  The  final  section  of  this  i 

paper  is  a bibliography  of  all  methodological  paperB  on  the  Jackknife  j 

published  between  197^  and  1977  which  have  come  to  my  attention. 


2.  BIsb  Reduction 

Quenouille  (1956)  pointed  out  that  if  the  estimator  0 for  a sample 
of  size  n has  the  expectation 


A a,  ftp 

E(§)  » 0 + + -f  + ...  , 


(7) 


then  the  Jackknifed  estimator  eliminates  the  leading  bias  term,  i.e.. 


E(0)-0  + O + -|+... 

n 


(8) 


This  idea  was  generalized  in  Schucany,  Cray,  and  Owen  (1971).  Let 
§1  and  §2  be  two  estimators  of  0 with  expectations  of  the  form 
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EC^)  • 0 + f^nWe)  , 

E(02)  - 6 + f2(n)b(0)  . 


Then,  the  estimator 


0 = det 


f^n)  fg(n) 


(n)  f2(n), 


(10) 


**# 

is  an  exactly  unbiased  estimator  for  6 , i.e.,  E(6  ) ■ 0 . ■ 

The  estimator  0#  is  called  the  generalized  Jackknife.  It  includes 

- — - — | 

i 

the  standard  Jackknife  as  a special  case  with  the  identifications 


(11) 


i 

i 


By  extending  the  definition  (10)  to  include  three  or  more  estimators,  the 
second  or  higher  order  bias  terras  in  the  expansion  (7)  can  be  exactly 
eliminated  by  the  generalized  Jackknife.  For  more  detail  on  theBe  gene- 
ralizations the  reader  is  referred  either  to  Schucany,  Gray,  and  Owen 
(1971)  or  Gray  and  Schucany  (1972). 
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3.  Bobust  Interval  Estimation 

In  his  1953  abstract  Tukey  proposed  that,  as  a method  for  robust 

■v 

confidence  interval  construction,  . ..,  0n  could  be  treated  as  n 
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independently,  identically  distributed  random  variables  vith  mean  0 . 

The  pseudo-values  are  clearly  identically  distributed  provided  Y^, ,Yn 

are  independently,  identically  distributed  and  the  estimators  ©(Y^,. . . .Y^) 
treab  the  random  variables  symmetrically,  so  the  major  question  hinges  on 
whether  or  not  the  pseudo-values  behave  as  though  they  are  approximately 
independent.  In  this  direction  a great  deal  of  research  has  been  devoted 
to  learning  when  . 

— ^ 4 N(0,1)  (12) 

as  n 90  . 

If  0 is  asymptotically  normally  distributed  as  Indicated  in  (12), 
then  the  interval  estimate  0 ± gtt^(E”(0^  - 0)2/n(n-l) )^2  , where  g01^2 
is  the  1 - (a/2^  percentile  point  of  the  standard  normal  distribution, 
gives  a robust  way  of  testing  or  bounding  0 . Folklore  Bays  that  in  place 
of  g0*^2  it  is  better  to  use  t^2  , where  t^2  is  the  percentile  point 
from  a t distribution  with  n-1  degrees  of  freedom.  The  rationale  for 
this  folklore  ostensibly  stems  from  a strong  belief  in  the  approximate  in- 
dependence and  normality  of  the  pseudo-values,  but,  with  Just  one  exception, 
all  papers  on  distribution  theory  for  the  Jackknife  have  focused  on  esta- 
blishing asymptotic  normality.  In  fact,  it  is  often  the  case  in  practice 
that  the  Jackknife  t intervals  are  conservative  (i.e.,  wider  than  neces- 
sary for  the  nominally  listed  coverage),  so  it  may  be  a better  policy  to 
use  the  normal  critical  constant. 
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Under  what  circumstances  is  the  Tukey  proposal  valid,  and  when  is  it 
invalid?  Miller  (1964,  1968)  proved  that  (12)  is  valid  if  the  estimator 
is  a smooth  function  of  a sample  mean  or  means,  i.e.,  8 ■ t(?)  . Estima- 

tors of  this  type  include  transformed  averages  and  variances.  This  approach 
was  extended  by  Arvesen  (1969)  and  Arvesen  and  Layard  (1975)  to  functions 
of  U-statlstics  & ■ f (u)  in  order  to  handle  variance  component  problems. 
The  proposition  (12)  is  also  true  for  functions  of  regression  estimators 

A 

f(8)  in  the  general  linear  regression  model  Y ■ XB  + e as  shown  by 

■w  *»#  w 

Miller  (1974b). 

Brillinger  (1964)  took  a different  limiting  approach  by  holding  g 
fixed  and  letting  k 00  for  the  grouped  Jackknife  (4)  - (6).  His  proof 

A ^ A 

shows  that  if  0 * f (<f>)  where  <t>  is  a root  of  the  likelihood  equation 


n 3 log  p(y>,<|>) 

1 55—1  " 0 ’ 


(13) 


then  as  k » 


d 

- 2 * Vl  * 


(lU) 


This  is  the  lone  instance  in  which  the  t distribution,  rather  than  the 
normal,  has  been  established  an  the  approximating  distribution,  Typically, 
however,  one  would  prefer  to  have  g » k , so  the  asymptotics  do  not  es- 
tablish the  t approximation  in  many  problems. 

Since  the  maximum  likelihood  estimator  has  a well-established  asymp- 
totic distribution  theory  involving  Fisher's  information,  the  need  for 
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jackknifing  in  this  context  has  been  questioned.  However,  in  recent  work 
Reeds  (1977)  has  answered  these  criticisms.  Firstly,  he  has  proved  the 
asymptotic  normality  for  0 when  g -*■  » with  k » 1 , and  secondly  he 
has  shown  that  the  Jackknife  gives  the  correct  asymptotic  variance  for  0 
and  0 even  if  the  model  is  incorrect.  The  Fisher  information  does  not 
do  this  because  it  is  computed  theoretically  on  the  basis  of  the  assumed 

A 

density  p(y,<j>)  • If  the  model  is  incorrect  it  may  not  be  clear  what  $ 
is  estimating,  but  in  problems  like  the  location  of  a symmetric  distribu- 
tion it  will  be.  Reeds'  work  applies  as  well  to  more  general  M-estimators 
and  in  this  regard  the  reader  should  also  see  Brillinger  (1976). 

A 

The  basic  Ingredient  needed  in  the  estimator  8 for  the  Tukey  pro- 
posal to  work  is  for  it  to  be  a smooth  function  of  each  . The  proofB 
depend  upon  power  series  expansions  of  the  estimator  in  each  of  the  random 
variables.  The  common  motif  of  the  estimators  mentioned  above  is  that 
asymptotically  they  are  all  functions  of  glorified  means.  By  this  I mean 
that  they  are  asymptotically  equivalent  to  a function  of  a (possibly 
weighted)  sum  of  independent,  identically  distributed  random  variables. 
This  is  true  for  U-atatistics,  regression  coefficients,  maximum  likelihood 
estimators,  M-estimators,  etc.  In  the  cases  where  the  Jackknife  is  known 
not  to  work,  such  as  for  the  median  or  other  percentile  estimators,  this 
is  not  true. 

Three  remarks  seem  in  order  before  closing  the  discussion  on  the  use 
of  the  jackknife  for  robust  interval  estimation. 

The  first  is  that  based  on  mean  square  error  considerations  in  ratio 
and  other  problems  and  on  uniqueness  criteria,  the  choice  g * n see&>3 
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beat.  (See,  for  example,  R&o  (1965)  and  Rao  and  Webster  (1966).)  However, 
Hinkley  (197Tb)  tentatively  suggests  that  the  accuracy  of  the  t distribu- 
tion approximation  may  be  improved  by  taking  k larger  than  one  and  that 
there  is  little  loss  in  efficiency  for  Bmall  k > 1 . The  argument  for 
this  la  that  the  skewness  and  kurtoais  of  Y^/k  , which  is  the 

dominant  linear  term  in  9^  , may  be  considerably  improved  by  selecting  k 
slightly  larger  than  one. 

The  second  remark  is  that  jackknifing  does  not  correct  for  outliers. 
The  reader  should  not  oonfuse  large  sample  robustness  of  the  Jackknife 
procedure  for  any  underlying  distribution  with  resistance  to  contaminating 
observations  in  small  or  moderate  size  samples.  In  fact,  the  Jackknife 
appears  to  be  rather  sensitive  to  aberrant  values.  This  sensitivity  may 
make  it  a useful  device  for  detecting  outlierB  in  complicatsd  estimation 
problems.  Trimming  of  the  pseudo-values  or  application  of  other  robust 
procedures  to  them  may  be  a good  way  of  correcting  for  the  outliers. 
Hinkley  (1976,  1977a)  has  started  an  investigation  of  this  in  the  context 
of  correlation  coefficient  estimation. 

The  third  and  final  remark  1b  that  if  you  are  going  to  use  a grouped 
Jackknife  with  k > 1 , random  selection  of  the  groups  is  probably  the 
most  sensible  approach  when  the  Y^,  ...,  Yn  are  identically  distributed, 
but  if  the  underlying  random  variables  are  not  identically  distributed, 
then  one  presumably  has  the  opportunity  to  do  better.  In  particular,  the 
regression  situation  comes  to  mind.  It  may  be  possible  to  exploit  the 
pattern  in  the  independent  variable  vectors  associated  with  the  Yj 

to  form  groups  which  give  the  jackknifed  estimator  a better  mean  squared 
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error  or  improved  robustness  in  interval  estimation.  Hinkley  and  Miller 
have  Borne  inconclusive  results  along  this  line,  but  at  this  point  It  is 
difficult  to  see  what  the  general  principle  of  selection  should  be. 

4.  Connection  with  Influence  Functions 

To  establish  the  connection  between  the  Jackknife  and  the  influence 
function  it  is  necessary  to  give  a brief  description  of  the  latter,  von 
MiseB  (1947)  introduced  the  idea  in  his  study  of  differentiable  statistical 
functions,  but  it  remained  relatively  unnoticed  for  two  decadeB  until  in- 
vestigators interested  in  robust  estimation  uncovered  its  usefulness  (Bee 
Hampel  (1974)). 

In  many  estimation  problems  the  unknown  parameter  0 can  be  considered 
to  be  a function  0 ■ T(F)  of  the  underlying  distribution  F , and  its 
estimator  § to  be  the  same  function  of  the  sample  distribution  function. 

For  example,  in  the  case  of  the  mean,  0*/ydF(y)  and  § ■ /ydFn(y)  ■ E”  Y^/n  . 
The  influence  function  l(y,0)  measures  the  amount  of  change  in  T(F)  for 
an  infinitesimal  change  in  the  weight  assigned  by  F to  y . It  la  like  a 
partial  derivative  of  T with  respect  to  a change  in  F at  coordinate  y . 
Specifically, 

T((l-e)F  + e<5  ) - T(F) 

l(y,0)  ■ lim r— ^ , (15) 

e-K) 

where  5 is  the  distribution  function  which  places  mass  one  at  y . 

y 


Under  regularity  conditions  the  function  T can  be  expanded  in  a 
series  involving  (15)  and  higher  order  derivatives.  Specifically, 


1 

T(0)  = T(F)  + | l(y,0)dG(y)  + ...  . (l6)  j 

In  the  case  where  G is  the  sample  distribution  function  Fn  , the  expansion 
(l6)  and  the  identifications  § ° T(Fn),  0 = T(F)  give  j 

! 

e - e + M i(y. ,8}  + ...  . (it)  ! 

n 1 i , 

i 

t 

J 

The  randor:  variables  l(Y^,0)  are  independently  and  identically  distributed  j 

with  mean  Jl(y,0)dF(y)  » 0 and  variance  Jl  (y,0)aF(y)  . Since  the 

ju  i 

higher  order  terms  in  (17)  are  Op(n  ’)  under  the  regularity  conditions,  > 

A * 

the  asymptotic  distribution  of  0 is  given  by 

< 

j 

*£<§-  0)  ♦ N(0,  /l2(y,0)dF(y))  . (l8)  j 

{ 

i 

The  connection  between  the  jackknife  and  the  influence  function  is  j 

i 

that  the  pseudo-value3  give  finite  difference  Baraple  estimates  of  the  in- 
fluence function.  For 


“a  TmTn  ■ 1191 


the  quantity  (l-c)F  + eS  at  y ■ Y,^  becomes 


(l-c)F  + c6y  » — F ■ ~T  6y  - 9 . , , 
n n*J.  n n*l  i ^ n*! 


(20) 


I 


where  F , . is  the  sample  distribution  function  based  on  n - 1 

n-l,-i 
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observations  with  the  ith  observation  deleted.  If  the  finite  difference 
sample  estimate  of  I(y,6)  at  Yi  for  0 ■ § * T(Fn)  is  defined  by 

T( (i-e)F  + e6Y  ) - t(f) 

JCY^S)  - r-i , ({?!) 

F*Fn,e«,-l/(n-l) 

then  it  follows  that 


0^  * n§  - (n-l)§_^  , 

- 0 +1(n-l)(8-8-1)  , (22) 

« § + i(Yi,§)  , 


because  0 j «*  T((l-£)?n  + ed^  ) . 

i A A 

If  the  influence  function  is  sufficiently  smooth  so  that  l(y,0) 
converges  to  l(y,0)  for  all  y as  n •*•  » , then  each  pseudo-value  ^ 
is  approximately  0 + 1(7^0)  . This  means  the  Jackknife  will  be  behaving 
correctly  asymptotically  because  0 will  be  asymptotically  normally  dis- 
tributed with  mean  0 and  variance  Jl^(y,0)dF(y)/n  , which  is  the  correct 
limiting  distribution  of  § for  any  underlying  distribution  function  F . 

Huber  (1972)  had  indicated  that  the  Jackknife  should  work  properly 
asymptotically  for  robust  estimators  with  smooth  influence  functions.  An 
example  is  the  trimmed  mean,  which  has  a continuous  influence  function.  A 
little  algebra  shows  that  the  sample  variance  of  the  pseudo-values  for  the 


trimmed  mean  approximately  eq.ualB  the  Winsorized  sample  variance.  The 
latter  is  the  correct  variance  to  use  with  the  trimmed  mean  so  the  jackknife 
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is  performing  as  it  should  (see  Cox  and  Hinkley  (197k) , p.  350).  On  the 
other  hand,  the  median  and  the  Winsorized  mean  have  discontinuous  influence 
functions.  It  is  well  known  that  the  jackknife  doesn't  work  for  the  former, 
and  it  won't  work  for  the  latter  either  because  it  depends  heavily  on  two 
order  statistics. 

Two  recent  developments  are  worth  mentioning  before  closing  this 
section. 

Hinkley  ( 1977a)  has  initiated  an  investigation  into  the  second  order 
derivatives  to  see  if  there  is  any  information  in  them  which  might  improve 
the  performance  of  the  jackknife.  Specifically,  he  examines  estimators 
which  admit  the  expansion 


§-e  + nf 


to  see  what  effect  the  terra  involving  the  second  derivative  Ig  has  on  the 
Jackknife. 

The  jackknife  operates  by  deleting  observations.  Thus,  as  a finite 
difference  approximation  to  the  derivative  I(y,8)  , it  subtracts  mass  at 
y . Mallows  has  proposed  an  alternative  finite  difference  approximation 
which  adds  mass  at  y . In  effect,  this  introduces  a procedure  which  addB 
hypothetical  observations  to  the  sample.  For  a discussion  of  this  the 
reader  is  referred  to  Devlin,  Gnanadesikan,  and  Kettenring  (1975)-  In  a 
similar  spirit  Efron  (1977)  has  proposed  inferential  procedures  based  on 
Bamples  generated  randomly  according  to  the  empirical  distribution  function 
of  the  sample.  He  has  coined  the  term  bootstrap  for  these  procedures,  and 


he  has  demonstrated  that  the  Jackknife  i3  Just  a linear  approximation  to 
the  bootstrap. 

5-  Applications 

i)  Ratios.  One  of  the  earliest  applications  of  the  Jackknife 
was  to  ratio  problems.  Let  X^,  . . . , Xffl  be  a sample  with 
theoretical  mean  y , and  Y^,  Y be  a sample  with  theore- 
tical mean  n • The  problem  is  to  estimate  0 * n/u  • and  the 
standard  ad  hoc  estimator  is  0 * ¥/5c  . Durbin  (1959)  showed 

A 

that  Jackknifing  9 improves  not  only  itB  bias  but  also  its 
mean  squared  error  in  many  cases.  Later  authors  amplified  on 
these  results  and  compared  the  Jackknifed  estimator  with  other 
ratio  estimators.  For  a full  discussion  of  this  application  the 
reader  is  referred  to  Miller  (197^a). 

ii)  Variances.  The  sensitivity  of  normal  theory  variance  testing 

procedures  to  departures  from  normality  is  well  established. 

Mosteller  and  Tukey  (1968)  and  Miller  (1968)  proposed  Jackknifing 
~ 2 

6 = log  s as  a way  of  handling  this  problem  in  robust  fashion. 
Shorack  (1969 ) compared  the  jackknife  estimator  and  some  other 
robust  procedures  for  the  two  sample  problem.  These  ideas  also 
extend  to  robustly  handling  the  k sample  problem  and  variance 
component  problems.  For  a fuller  discussion  on  this  area  the 
reader  i3  referred  to  Miller  (I97ha). 


iii)  Correlation  Coefficients.  Another  problem  where  the  normal 

~~  j 

theory  procedure  is  not  robust  is  interval  estimation  for  the  ! 

correlation  coefficient.  The  test  that  p equals  zero  is  robust 

to  non- normality,  but  for  p ^ 0 the  asymptotic  variance  of  | 

! 

§ = tanli'^r  = (l/2)ln{(l+r)/(l-r)}  is  not  l/(n-3)  unless  the  j 

! 

underlying  distribution  is  normal.  Duncan  and  Layard  (1973)  ! 

j 

studied  jackknifing  0 and  found  that  it  works  well  for  most 

t 

distributions.  Recent  work  on  improving  the  jackknife  in  con- 
nection with  the  correlation  problem  is  contained  in  Hinkley 
(1976,  1977a).  i 


iv)  Censored  Data.  Considerable  progress  has  been  made  on  the  ana- 
lysis of  censored  data  within  the  last  two  decades.  In  four 
landmark  articles  the  product-limit  estimator  of  a distribution 
function  was  introduced  by  Kaplan  and  Meier  (1958),  the  log-rank 
analysis  for  two  sample  tests  on  censored  data  appeared  in  Mantel 
and  Haenszel  (1959),  the  Wilcoxon  rank  test  was  adapted  to  cen- 
sored data  by  Gehan  (1965),  and  Cox  (1972)  presented  hiB  condi- 
tional likelihood  analysis  of  a proportional  hazards  model.  None 
of  these  procedures  requires  the  services  of  the  Jackknife  because 
the  relevant  standard  errors  can  be  estimated  without  difficulty. 
However,  for  more  complicated  censoring  and  truncation  problems 
as  in  Turnbull  (197^,  1976)  estimation  of  the  standard  error  be- 
comes messier  and  the  Jackknife  may  be  useful.  Similarly,  the 
standard  error  for  the  estimated  probability  of  survival  beyond  a 
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specified  time  for  the  proportional  hazards  model  with  covariates 
is  sufficiently  complicated  that  the  Jackknife  may  be  a good  way 
of  estimating  it.  Preliminary  work  on  the  performance  of  the 
Jackknife  in  the  presence  of  censoring  appears  in  Miller  (1975) 
and  Route  (1976). 

v)  Model  Simulation.  It  is  difficult  to  get  analytic  answers  for 
probability  models  which  are  sufficiently  intricate  to  accurately 
model  realistic  storage  systems,  queueing  systems,  etc.  Usually 
it  is  necessary  to  simulate  the  Bystem  on  a computer.  The  esti- 
mates of  the  important  parameters  of  the  system  can  sometimes  be 
improved  by  Jackknifing,  and  the  variability  of  the  parameter 
estimates  can  be  assessed  by  jackknifing.  Examples  of  this  can 
be  found  in  Gaver  (l975»  1977)  and  Iglehart  (1975). 
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MODELING  AND  ESTIMATING  THE  AVAILABILITY 
OF  COMPLEX  SYSTEMS: 

THE  JACKKNIFE,  COMMON-CAUSE,  AND  INSPECTION  MODELS 
Donald  P.  Gaver 

Operations  Research  Department 
Naval  Postgraduate  School 
Monterey,  Ca.  93940 


1.  Introduction 

An  important  property  of  any  system  of  cooperative  or 
interacting  components  or  equipments  is  its  availability.  By 
this  is  meant,  roughly  speaking,  the  fraction  of  time  during 
which  the  system  is  operative  and  thus  able  to  perform  its 
intended  function,  and  is  not  down  for  maintenance  or  repair. 

This  paper  outlines  various  ways  in  which  component,  and  then 
system#  availability  may  be  described,  i.e.  represented  by 
mathematical  models.  In  Section  4 it  is  shown  how,  in  several 
cases,  operational  data  may  be  used  to  estimate  availability, 
and  also  to  assess  the  uncertainty,  or  error,  of  the  estimates. 
The  technique  used  for  this  purpose  here  is  called  the  jackknife; 
see  Mosteller  and  Tukey  (1977),  and  Gaver  and  Chu  (1977),  from 
which  the  present  account  is  borrowed.  In  Section  6 models  for 
redundant  repairable  systems  susceptible  to  common  cause  (some- 
times termed  common  mode)  failures  are  described  and  analyzed. 

It  is  shown  that  redundancy  loses  effectiveness  when  common 
cause  failures,  perhaps  caused  by  external  events  such  as  weather 
or  human  error,  tend  to  occur.  Finally  in  Section  7 a sample 
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model  for  a standby  system  subject  to  periodic  Inspection  i3 
introduced  and  examined.  Although  occasional  inspection  and 

# ) 

i 

testing  of  a standby  unit,  such  as  a military  weapon  or  a i 

reactor  safety  system,  is  important  to  detect  inoperability,  j 

too-frequent  inspection  may  well  increase  the  likelihood  of  j 

failure.  The  model  suggests  an  optimum — or  at  least  reason-  j 

able— inspection  interval  as  a compromise.  1 

1 

i 

I 

2 . Systems  and  Scenarios  j 

I 

( Examples  of  the  kinds  of  systems  we  have  in  mind  are  j 

i 

shipboard  communications  (for  a study  Bee  Perrin  (1975)),  j 

general  aircraft,  including  the  engines  and  avionics,  nuclear  I 

reactor  safety  systems,  electric  power  boilers  and  generators,  | 

telecommunications  systems  including  those  involving  satellites,  1 

i 

and  computer  systems.  j 

Such  systems  are  complex,  being  made  up  of  various 
interacting  components,  usually  including  a human  link  in 
either  an  active  or  maintenance  capacity.  The  effect  of  improper 
maintenance  is  addressed  in  the  inspection  model  of  Section  7 
but, is  otherwise  ignored.  A range  of  operating  scenarios 
must  be  considered.  Some  are 

1)  Equipments  always  active,  except  when  failed  and  when 
maintenance  is  carried  out;  examples j a base-loaded 
electric  power  generator  powered  by  a nuclear  plant, 
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2)  Equipments  inactive  (on  "cold  standby")  unless  needed; 
examples i most  weapons  such  as  missiles,  nuclear 
reactor  safety  systems, 

* 

3)  Equipments  (modules)  active  unless  they  are  in 
maintenance  or  spare  stock;  example:  replaceable 
aircraft  engines. 

There  are  other  scenarios  also;  many  include  a certain  amount 
of  redundancy,  i.e.  extra  equipments  to  be  relied  upon  in 
case  one  or  more  of  those  "on  line"  fail. 

Some  appropriate  definitions  of  availability  are  as 

follows  t 

a)  Availability  is  the  (expected)  fraction  of  time  an 
equipment  is  workable  or  up.  Such  a definition  obviously 
.relates  to  productivity  of  a base-loaded  power 

generation  or  propulsion  system. 

b)  Availability  is  the  probability  that  a By stem  is  up  when 

needed.  Such  a definition  is  suitable  for  a "cold 

standby"  system,  such  as  a missile  or  other  weapon,  or 

a reactor  safety  system,  or  perhaps  certain  communica- 

% 

tion  devices.  To  say  that  the  system  is  "up  when 
needed"  may  also  imply  that  the  system  remains  up  for 
a significant  time  period  thereafter. 


3.  A Single  Equipment  Model 

Consider  a single  equipment , for  instance  or  a component 
of  a system  or  a system  itself,  such  as  a nuclear  power  plant. 
Describe  the  equipment  times  to  failure  or  uptimes  by  random 
variables  U^,  and  the  subsequent  repair  times  by  random  vari- 
ables D^,  i ■ 1,2,...  . Supposing  that  the  system  begins  up, 
then  the  first  cycle  terminates  at  + Dj^  with  the  system 
again  upj  the  ith  cycle  duration  is  Uj,  + ■ C^.  Then  if 

Ay(t)  is  the  availability  of  the  system  at  t,  given  that  the 
system  was  initially  beginning  an  up  period,  and  if  cycle  times 
are  independent,  one  arrives  at  the  Volterra  integral  equation 
for  Ay(t) 


AU(t) 


1 


vt}  + 

U 0 


Au(t-x)  Fc(dx)  , 


(3.1) 


Fg  being  the  distribution  function  (d.f.)  of  U,  and  Fc  the 
d.f.  of  a cycle  length  C.  Renewal  theory  shows  that  if  either 
U or  D or  C has  an  absolutely  continuous  component  that 
then 

Au(t)  “ ETUT^TE'tBT  " Au  (3-2) 


provided  the  expected  values  [E,U]  and  E[D]  are  finite.  This 
simple  expression  describes  the  long-run  point  availability  of 
the  system.  Notice  that  nothing  is  said  about  the  independence 


of  U and  the  subsequent  0:  examples  exist  to  show  that  if  I 

i 

U and  D are  positively  related  (correlated)  the  rate  of  approach  1 

i 

to  the  value  A^  is  slower  than  if  they/are  independent  (see 

j 

Gaver  (1972))*  If  the  equipment  is  an  emergency  unit  (weapon  j 

or  Bafety  system)  that  is  required  at  a random  time  T,  and  T j 

-st  ! 

has  the  exponential  distribution  FT(t)  ■ 1 - e , where 

—l  i 

s ■ E [TJ , then  the  convolution  properties  of  Laplace  transforms  ^ 

show  that  availability  at  demand  time  is,  for  any  s > 0, 


(3.3) 

where  Fy(s)  - E[e“sU],  and  Fc(s)  - E[e“sC].  This  expression 
can  easily  be  evaluated  for  some  familiar  distributions  (not 
the  log  normal),  and  A^ts)  approaches  Ay  as  s 0.  Demand 
times  occurring  according  to  gamma  distributions,  or  even  more 
general  laws,  can  be  handled  in  similar  fashion. 


1 - P0W 


A (s)  - / A.,(t)  e“8ts  dt  - 

u 0 u 1 - F 


Fc(.) 


K**-  r« 


f.  ! 
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inferences  (statistical  estimates)  of  system  availability. 

One  promising  technique  for  dealing  with  this  problem  is  the 

# 

jackknife ? see  Mosteller  and  Tukey  (1977),  Chap.  8. 

Analysis  shows  that  in  large  samples  the  jackknife  method  tends 
to  remove  estimator  bias— its  originally  advertised  purpose  — 
and  in  addition  supplies  usefully  accurate  confidence  limits. 

We  report  Monte  Carlo  simulation  results  that  indicate  the 
validity  of  such  confidence  limits  for  realistically  smallish 
numbers  of  observations  as  well.  In  a later  section  we  also 
show  how  the  method  extends  to  systems  of  independently  failing, 
and  independently  maintained,  equipments. 

The  approach  proceeds  by  first:  examining  the  obvious 
point  estimate  of  A^s 


A - ~ , (4.1) 

u + d 

where  u and  5 are  the  means  of  the  observed  up  and  down 

\ 

times.  We  first  rewrite  it  (transform)  to  consider 

z ■ in  ^ j ■ in  u - An  d . (4.2) 

The  purpose  of  this  transformation  is  to  allow  consideration 
of  a quantity  more  nearly  symmetrical  and  even  normal  (Gaussian) 

•w 

than  is  A itself.  Note  that  although  the  log  transformation 
is  likely  to  be  effective  other  possibilities  exist  as  well; 
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the  cube  root  or  Wilson-Hilferty  is  a plausible  alternative 
(see  Kendall  (1947) ),  apparently  not  yet  much  investigated.  * 
Having  computed  z based  on  all  observations  one  next 
recomputes  z,  but  leaving  out  the  jth  pair  of  observations 
( j **  1/  2, . . . ,n)  : 


Here  it  is  assumed  that  the  number  of  up  times  and  down  times 
are  equal.  Next,  compute  the  pseudovalues 

z j ■ nz  ■ (n-l ) z_j  , ( j “ 1,2, .. . ,n)  , 

and  their  mean  and  variance: 


3 j 
7 J 


i 

I 


4 

\ 

i 

l 


z 


1 ? 

" j.l  ' 


Now  quote  as  point  estimate  of  availability  the  quantity 

e* 

& K 

♦ JK  - 

1 + ez 

obtained  by  inverting  the  log-logistic  transformation.  At  the 
suggestion  of  J.  W.  Tukey  (1958),  treat  the  individual  z_j's 
as  approximately  independent  and  Normal  and  so  apply  Student's 
t to  establish  approximate  confidence  limits  first  on 


39$ 


JlntAy/ (1-Ay) ) and  then  on  Ay  itself:  for  two-sided  (1-a)  *100% 


intervals  find 


K«  * 5 + 4-0/2  (n’1)-  V5 

4 - 5 ' 4-o/2  ^n*"^  * V? 


so  that,  approximately, 


and  thus  also 


-o  < ‘“(rriji8. 


L«  H 

e ° r n ' 6 

rr  1 \ i 1 

1 + e a 1 + e 


with  (l-a)«100%  confidence. 

Asymptotic  techniques  (n  large)  of  R.  Miller  (1964 ) 

will  show  that  this  procedure  tends  to  be  valid.  That  it  is 

% 

also  robust  of  validity— coverage  of  the  true  availability 
reasonably  close  to  stated  95%  for  a variety  of  distributions 
of  up  and  down  times — is  borne  out  by  simulations;  see  the 
following  tables  for  n = 15  and  n - 25.  Distributions  con- 
sidered are  these 

i 

A.  and  mutually  independent  and  each 

exponential;  ElU^]  ■ X”1,  EJD^J  - y”1. 


B.  U,  independently  exponential  and  independently 

, gamma  with  shape  parameter  k « 3. 

C.  U|  independently  exponential  and  ganuna  and 

independent  with  the  gamma  proportional  to  the 
preceding  exponential  up  time;  shape  parameter 

k - 2. 

D.  independently  "long-tailed  h"  (i.e.  U ■ Xe^xf 
with  X exponential,  adjusted  for  desired  mean  and 
variance),  0^  independently  exponential. 

The  long-tailed  h distributions  of  D are  introduced  to 
represent  data  appearing  nearly  exponential  for  small-to-medium 
values  but  that  has  long  tails.  For  more  details  see  Gaver 
and  Chu  (1977),  and  Gaver  (1978).  Thus  an  attempt  has' been 
made  by  means  of  the  above  four  distributional  forms  to  deal 
with  data  cf  a reasonable  and  plausible  variety.  This  is 
necessary,  for  there  is  little  chance  that  the  "correct  distri- 
butions" can  be  identified  from  the  data  itself  in  an  applied 

% 

situation.  Notice  that  in  the  case  of  data  model  A — ups  and 
downs  independently  exponential — an  exact  solution  is  available 
in  terms  of  the  classical  F-statistic,  for  D/U  is  seen  to 
be  a ratio  of  independent  chi-squares.  Acting  as  if  the  "F" 
procedure  is  applicable  in  every  case  considered  is  clearly 
less  valid  than  is  the  jackknife,  as  the  tables  show. 
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5 . The  Jackknife  Applied  to  System  Availability  Estimation 

The  jackknife  technique  can  also  be  applied  to  estimate 
the  availability  of  systems  of  equipments;  in  fact,  this  may 
be  its  most  important  application.  We  indicate  by  some  examples 
the  effectiveness  of  the  procedure. 

K-Component  Redundant- -Identical  Units 

Xf  K units  are  in  parallel,  and  all  must  be  down  in 
order  for  the  system  to  be  down,  then  long-run  unavailability 
is,  under  independence  assumptions, 

K KID.J  / B[DJ 

A “ ih  ^1"+  E£Bi]  * \ ETtJTTTOT 

which  would  naturally  be  estimated  by 


iK 


(5.1) 


i 


1 


j 

i 

i 

i 

i 

( 

i 

j 


and  so  one  merely  jackknifes  z as  before  and  inverts  to  put 
(l~a)  *100%  oonfidence  limit*,  on  At 


(5.4) 


i 
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0.01,  u - 1 «F"s  88.7  1.27 


K-Component  Redundant— Different  Units 

Xf  the  units  are  unlike  it  is  plausible  to  jackknife 


in  A 


- I \ • 

i-1  \ + Si  / 


(5.5) 


The  straightforward  way  of  carrying  this  out  is  to  compute  as 

before  the  jth  pseudovalue  i^  of  in  K.  for  i - 1,2,...,K, 

2 

find  its  mean  and  variance,  denoted  by  i^  and  s^.  Last, 
combine  to  obtain 


K , K . _ 

I - J I.  , and  s2  - l s2 
i-1  1 i-1  ni  1 


(5.6) 


Upper  and  lower  confidence  limits  on  in  A are  then  of  the  form 

Hu  “ 1 + tl-a/2^iJ1  ni  " K,’B 


L0  - I - t 


l“a/2lili  " K>  *' 


These  may  be  translated  to  limits  on  A,  and  on  A.  An  alternative 
procedure  is  one  of  linearization  around  the  jackknifed  point 
estimate  of  in  A;  for  details  see  Gaver  and  Chu  (1977)  . 

Some  Monte  Carlo  simulation  results  are  exhibited 
in  Table  3.  Once  again  the  results  seem  usefully  valid  and 
efficient. 
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0.01,  A = 0.02  JK,2:  93.0  5.1  x 10~4  6.0 


Two -Out -of -Three  Voting 

A final  example  is  provided  by  a system  of  three  units  . 
that  is  considered  available  if  any  two  are  simultaneously  avail- 
able . Thus 

A ® AjAjA^  t A^A^A^  ^ A»j A ^A ^ 

As  usual,  up  and  down  time  data  are  assumed  to  be  available  on 
all  three  units;  we  do  not  wish  to  assume  them  identical. 

One  procedure  is  as  follows.  First  compute  the  jackknifed  point 

estimate  of  system  availability.  Next  consider  the  log-logistic 

/ 

mt  «v 

transformation  l - £n[A/(l-A)),  and  expand  to  linear  terms 

around  the  jackknifed  point  estimate  A* , thus  finding  an  expression 
2 

for  sA>  for  further  details  see  Gaver  and  Chu  (1977). 
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0.01,  X = 0.02,  x ~ 0.04 


6 . Common-Cause  Failure  Models 


The  previously  discussed  models  for  availability  of  systems 
assumed  that  the  component  equipments  failed  and  were  repaired 
independently.  Such  an  assumption  is  often  inappropriate:  common 
causes  of  failure,  such  as  environmental  shock  or  personnel  error, 
may  well  be  decisive.  We  present  now  a simple  model  for  catastrophic 
common-cause  failure. 

A Repairable  System  Experiencing  Common  Cause  Failure 

Consider  a system  of  m (m  >_  1)  identical  equipments, 
each  one  of  which  fails  independently  with  rate  X (exponential 
time  to  failure) , and  is  repaired  (after  an  exponential  time) 
with  rate  y.  The  system  is  also  confronted  by  a common  cause 
failure  mechanism,  such  that  when  it  is  activated  the  system  fails 
completely.  The  rate  of  occurrence  of  the  latter  is  c.  Rule 
that  the  system  is  operative  or  up  so  long  as  k out  of  m 

(1  < k < its)  units  operate.  The  system  fails  as  soon  as  at  least 

l ■ m-k+1  units  are  down  simultaneously.  The  problem  addressed 
here  is  to  calculate  the  expected  time  to  system  failure,  where 
failure  may  occur  either  because  of  the  individual  machine  chance 
failures,  or  because  of  the  common -cause  catas tropic  event. 

Analysis  of  the  model  may  be  conducted  in  terms  of  the 
state  variable  D(t);  D(t)  - j means  that  j units  are  on 

repair  at  time  t.  Clearly  D(t)  is  a Markov  process,  and  its 

state  transition  rates  are  specified  as  follows:  given  D(t)  ■ j, 
then 
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Change 

D<t  + dt)  - D( t)  + 1 

D(t  + dt)  - D (t)  - 1 

D( t + dt)  - m 

D(t  + dt)  - D(t) 


Probability 

X^dt 

ujdt 

cdt 

1 - (Xj  + Mj  + c)dt 


All  other  probabilities  are  negligible.  Of  course  X^  and 
may  be  specified  so  as  to  represent  any  kind  of  system;  for 
instance,  one  in  which  there  are  limited  numbers  of  repairmen  and 
thus  queueing  occasionally  occurs,  or  one  in  which  not  all  units 
are  simultaneously  operative  and  susceptible  to  failure.  Here  we 
specify  these  parameters  to  be 


Xj  - (m  - j) X, 


(6.1) 


Uj  - min ( j ,r) y,  j » 0,1, ...fxn 


where  r (1  _<  r <_  m)  is  the  number  of  repairmen  available  to 
work  simultaneously.  Furthermore,  r » m in  the  numerical  examples. 

The  process  {D(t)}  is  actually  birth-and-death  (see 
Feller  (1957))  with  an  independent  Markovian  killing  process.  Denote 
by  T£  the  elapsed  time  for  the  system  to  pass  for  the  first  time 
from  'D(O)  ■ 0 — no  element  down — to  the  state  l or  greater,  at 
which  point  system  failure  occurs.  Note  that 


P{T£  > t|D(0)-0}  - P{T£  > t|D(0)-0}e 


-ct 


(6.2) 


i I 
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where  T is  the  first  passage  time  to  l in  the  ordinary  birth- 
and-death  process  that  admits  no  catastrophes.  Equation  (6.2) 
simply  expresses  the  fact  that  failure  time  exceeds  t if  and  only 
if  neither  a chance  failure  nor  a catastrophic  failure  occurs 
before  t. 

Now  Laplace  transform  (6.2)  to  obtain 


e”st  P{T,  > t |D (0) “0 } * / e“st  P{T*  > t|D(0)-0}e“ctdt 

*.  o 

« TTc  {1  ' E[e“(s+C,T*]}  , (6.3) 


the  latter  following  by  an  integration  by  parts.  The  Laplace 

4r 

transform  of  T^  is  of  the  form 

E[e"(8+o)TA]  - V <Ms  + c)  (6.4) 

i-0  1 


where 

$^(B  + C)  - \"i  + + 8 + C - + C)  ' 

and 

V(0  + c)  - y0~fv -c  » 


i - 1,2.3,... 


(6.5) 


Bee  Karlin  and  Taylor  (1975).  By  combining  (6.3)  and  (6.4)  one 
may  calculate  (6.2) » then,  allowing  s 0 there  results 
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EtTjJ 


1 - E [e~cT£] 


Numerical  Example 

Let  m » 3,  k * 1,  1*3,  meaning  that  the  system  of  three 

equipments  fails  only  when  all  are  down  simultaneously.  Put 
-2  -1  -1 

A * 10  (days  ) , y * l(days  ),  and  consider  the  effect  of  varying 
the  catastrophe  rate,  c. 

E [T^ ] c (Catastrophe  Rate) 

3.5  x 105  0 

1 x 104  l(f4 

1 x 103  10‘3 

• * 

-4 

Obviously  a catastrophe  rate  as  great  as  10  completely  dominates 
the  effect  of  the  individual  unit  chance  failures.  Thus  only  if 
the  catastrophe  rate  is  of  magnitude  10 or  smaller  will  the 
present  redundancy  be  at  all  effective. 


i 


i 


i 
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7.  An  Inspected  System 

In  this  section  we  turn  attention  to  an  equipment  that  # 
is  not  expected  to  operate  constantly,  but  that  is  intended  to  be 
ready  when  needed.  An  example  is  a weapon  such  as  a gun  or  missile; 
another  example  is  an  alarm  or  safety  system,  perhaps  associated 
with  a nuclear  power  system. 

Attempts  to  insure  the  operability  or  readiness  of  such 
systems  usually  include  periodic  inspection  and  preventive  main- 
tenance . Our  model  incorporates  these  attributes;  furthermore, 
it  allows  for  imperfection  in  the  inspection-repair  process,  e.g. 
brought  about  by  human  error. 

The  Model 

A single  equipment  is  subject  to  periodic  inspections  and 
preventive  maintenance  or  repair  actions.  Let  the  time  from  the 
completion  of  a preventive  maintenance  period  until  the  beginning 
of  the  next  be  I time  units,  and  let  the  subsequent  preventive 
maintenance  period  require  R time  units;  both  VI  and  R will 
be  taken  to  be  fixed.  Hence  over  a long  Deriod  (say  one  year) 
the  system  presents  itself  as  nominally  "ready"  a fraction  of 
time  equal  to  1/(1  + R),  and  down  for  inspection  and  maintenance, 
and  hence  unavailable,  for  a fraction  of  time  R/(I  + R)  . Now', 
admit  the  possibility  that  the  system  be  additionally  unavailable 
for  one  of  two  reasons:  (a)  at  the  end  of  an  inspection- 
maintenance  period  the  equipment  is  returned  to  active  service 
in  an  inoperative  condition,  an  event  of  probability  5 (0  < 6 < 1)  , 
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(b)  at  the  end  of  an  inspection-maintenance  period  the  equipment 
is  up,  an  event  of  probability  <5  - 1-S,  but  it  fails  before  tjhe 
next  inspection,  and  is  thus  actually  unavailable  for  the  time 
following  that  failure  until  the  next  inspection-repair  period 
begins.  Let  F(t)  be  the  distribution  function,  and  f (t)  the 
density,  of  equipment  failure  time.  If  the  inspection  interval, 

I,  is  treated  as  a decision  variable  it  is  interesting  to  select 
its  value  so  as  to  maximize  long-run  availability,  or,  equivalently, 
to  minimize  long-runs  unavailability.  In  order  to  do  so,  first 

i 

calculate  the  expected  time  unavailable  during  one  cycle  of 
length  jL  + R: 

I 

SI  + 5 J (I  - t)  f (t)dt  + R ; 

0 


division  by  I + R then  gives  the  expected  unavailable  time  per 
unit  time  as  the  latter  depends  upon  I: 


A(I) 


61  + J /J  (I  - t)  f (t)  dt  + R 

I + R 


• 1 + J 


[-1  + /*( I - t)  f(t)dt  *] 

nr—  J 


(7.1) 


One  may  now  choose  I so  as  to  minimize  A;  differentiation  shows 
that  the  optimum  I must  satisfy  the  equation 


R « 


1 I IF(I)  - ll  F (t) dt 

i-ttct  / tfttjdt  rrnn 


I (7.2) 


414 


UIMJFrttlK  ■ 


since  the  denominator  is  a decreasing,  and  the  numerator  an 
increasing,  function  of  I there  is  exactly  one  root  of  (7.2,). 
Surprisingly  at  first  glance,  the  optimum  inspection  interval, 
Iq  t*  does  not  depend  upon  6,  the  probability  of  a failure  duHna 
the  inspection-preventive  maintenance  period.  Of  course  the 
eventual  system  availability  does  depend  upon  this  parameter; 
providing  1 t is  chosen  it  turns  out  that 


5<Xopt>  - 1 * 3'1  - ^opt” 


Although  (7.2)  cannot  usually  be  solved  explicitly  it  turns  out, 
in  the  case  of  exponential  failures,  to  be 

2 

R m [e*  — (1  + li)  ] ^ ^ ^ * 

when  1 is  small,  so  in  this  case 


and 

Adopt)  % 1 - J exp(-  /IXR)  , 

9 

or,  in  terms  of  availability, 


A(I0pt)  ~ * ®xp(-  • 
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QUALITATIVE  EVALUATION  OF  THE  M60A1  TANK  CAMOUFLAGE 
BY  OPERATIONAL  IMAGERY  INTERPRETERS 

EDWARD  R.  EICHELMAN 
*KD 

RONALD  L.  JOHNSON 

US  ARMY  MOBILITY  EQUIPMENT  RESEARCH  AND  DEVELOPMENT  COMMAND 
FORT  BELVOIR,  VIRGINIA  22060 

ABSTRACT.  A continuing  problem  In  the  assessment  of  camouflage  effective- 
ness has  been  the  objective  analysis  of  subjective  data.  This  paper  Is 
concerned  with  such  an  evaluation  for  an  M60A1  Tank.  Thirty  operational 
Image  Interpreters  analyzed  the  following  camouflage  prototypes:  natural 
foliage,  fender  nets,  two  styles  of  gun  barrel  disrupters,  and  counter- 
shading. Each  Interpreter  viewed  aerial  Imagery  of  each  condition.  A 
forced  choice  of  descending  ratings  was  assigned.  Mean  ratings  and  associ- 
ated variances  were  calculated.  The  scores  were  standardized,  and  the  Z 
statistic  was  employed  to  determine  significant  differences.  The  effective- 
ness of  foliage  was  significantly  better,  a ■ .01,  than  counter-shading. 

I.  INTRODUCTION. 

Up  through  World  War  II  the  development  of  camouflage  Involved  a sub- 
jective, artistic  approach  rather  than  the  scientific  method  now  advocated. 
With  the  advent  of  more  complex  sensor  systems  the  development  of  camouflage 
concepts  has  necessitated  a more  controlled  approach  based  on  stringently 
quantified  data.  The  results  of  the  analysis  are  then  used  as  a data  base 
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to  Identify  the  most  promising  camouflage  concepts  for  further  development. 

One  such  instance  in  which  the  U.S.  Army  is  involved  is  the  tactical  camou- 
flage of  the  M60A1  combat  tank.  The  purpose  of  this  study  was  to  objectively 
evaluate  the  effectiveness  of  various  prototype  camouflage  Items  for  the 
M60A1  tank.  It  was  accomplished  through  the  use  of  operation  image  inter- 
preters ( II  * s) . 

II.  DESIGN  OF  EXPERIMENT. 

A.  Targets.  The  test  targets  consisted  of  M60A1  tanks  In  the  follow- 
ing conditions: 

a.  Pattern  painted. 

b.  Pattern  painted  and  natural  foliage. 

c.  Pattern  painted,  countershading,  and  gun  barrel  disrupter 
(Type  I). 

d.  Pattern  painted,  fender  nets,  and  gun  barrel  disrupter 
(Type  II). 

These  various  conditions  of  camouflage  will  now  be  described  In  detail. 

1.  Pattern  Paint.  1 j 

I - 

The  purpose  of  the  camouflage  paint  patterns  Is  to  distort  j 

i 

straight  lines  and  edges  of  objects,  alter  perception  of  depths,  and  I 

i 

to  reduce  contrast  with  the  surroundings  and  cause  the  object  to  blend 

with  Its  background1.  Camouflage  paint  patterns  were  developed  by 

the  U.  S.  Army  Mobility  Equipment  Research  and  Development  Command 

(MERADCOM).  The  pattern  used  In  this  test  combines  patches  of  the  | 

i 2 

colors  forest  green,  light  green,  sand,  and  black.  It  Is  the  Summer  | 

U.  S.  and  European  verdant  pattern.  a > 


I 
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2.  Natural  Foliage. 

The  natural  foliage  was  Inserted  Into  specially  placed  brackets  to 
disrupt  the  target's  outline  and  distinct  features.  It  was  also  Intended 


to  reduce  the  target  to  background  contrast. 


e I G 


Countershading  of  the  target  consists  of  painting  the  normally  dark 
or  shadowed  areas  with  light  colors  (e.g. , white  or  gray ) to  reduce 
detection  and  Identification  by  means  of  these  visual  contrast  cues. 

The  Type  I gun  barrel  disrupter  Is  an  accordion  like  sleeve  that  slips 
over  the  gun  barrel  to  break  up  the  parallel  straight  edges  as  well  as 
to  distort  Its  shadow. 

4.  Fender  Nets  and  Type  II  Gun  Barrel  Disrupter. 


The  final  camouflage  condition  evaluated,  contained  fender  nets  and 
a Type  II  gun  barrel  disrupter.  Fender  nets  were  designed  to  cover  the 
visual  cues  of  the  tank's  track  system  and  lower  portion  of  the  hull. 

They  consist  of  six  foot  long  fiber  glass  rods  supporting  plastic  garnish 
material  from  the  Army's  standard  lightweight  camouflage  screening  system 
(LWCSS).  The  Type  II  gun  barrel  disrupter  Is  of  an  Irregular  fan  shaped 
design  which  Is  attached  along  the  top  of  the  gun  barrel. 


The  test  Imagery  consists  of  a series  of  4"  X 5"  color  positives  for 
each  of  the  camouflage  conditions.  Scaled  aerial  photographs  at  1:10,000 
and  1:5,000  were  taken  of  the  front,  back,  top,  and  both  sides  of  each 
target  M60A1  tank.  Additional  ground  level  photographs  were  taken  of 
each  target  for  documentation.  The  target  tanks  were  sited  so  that  they 
were  unobstructed  by  Indigenous  foliage. 


C.  Test  Procedure. 

Thirty  operational  Image  Interpreters  { I I 1 s)  participated  In  the 
camouflage  evaluation.  They  were  first  shown  the  close-up,  ground 
level  pictures  of  the  camouflaged  tanks  and  given  a brief  description 
of  the  purpose  of  each  type  of  camouflage.  The  pattern  painted  tank 
was  defined  as  the  bo^e  condition  upon  which  the  five  types  of  camou- 
flage were  applied,  They  were  than  shown  all  of  the  4"  X 5"  color 
positives  of  the  camouflage  conditions  for  evaluation.  In  order  to 
provide  objective  results  from  this  study,  the  II 1 s were  Instructed 
to  make  a forced  choice  In  analyzing  the  effectiveness  of  five  types 
of  camouflage.  The  ranking  choices  were  as  follows: 

1.  Most  effective 

2.  Above  average  effectiveness 

3.  Average  effectiveness 

4.  Below  average  effectiveness 

5.  Least  effective 
III.  EXPERIMENTAL  RESULTS. 

The  dependent  variable  of  this  test  Is  the  frequency  with  which  each 
prototype  camouflage  was  assigned  a particular  effectiveness  value  by  the 
I I ’ s . The  forced  selection  of  effectiveness  allowed  the  conversion  of  the 
subjective  data  Into  objective  results.  Figure  I shows  the  cumulative 
totals  of  the  forced  selection  of  effectiveness.  As  an  example,  for 
countershading,  the  left  end  of  the  lower  line  with  diamond  points  Indicates 
zero  choices  as  No.  1 (most  effective);  two  choices  as  No.  2 (above  average 
effectiveness);  two  more  choices  as  No,  3 (average  effectiveness)  for  a 
total  of  four;  seven  more  as  No.  4 (below  average  effectiveness)  for  a total 
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TIYE  NO.  OF  SELECTIONS 


GRAPHIC  RESULTS  OF  EVALUATION  OF  M-60A1 
CAMOUFLAGE  BY  OPERATIONAL  IMAGE 
INTERPRETERS 


CAMOUFLAGE  EFFECTIVENESS 
KEY 

1- MOST  EFFECTIVE  4-BELOW  AVERAGE  EFFECTIVENESS 

2-  ABOVE  AVERAGE  EFFECTIVENESS  5-LEAST  EFFECTIVE 

3- AVERAGE  EFFECTIVENESS 

▲ FOLIAGE  WITH  BRACKETS  ♦ GUN  BARREL  DISRUPTER  TYP  I 

■ FENDER  NETS  0 COUNTERSHADMQ 

« GUN  BARREL  DISRUPTER  TYP  I 

FIGURE  1 

it  21. 


of  eleven;  and  finally,  nineteen  more  as  No.  5 (least  effective)  for  a 
total  of  thirty. 

Table  I Is  a numerical  suirmary  of  the  II  data  by  camouflage  effective- 
ness rating  versus  the  type  of  camouflage. 

The  means  and  associated  standard  deviations  were  compared,  using  the 
Z statistic6,  to  determine  which  means  are  statistically  different  from 
each  other.  The  results  are  presented  in  Table  II. 

Any  Z value  greater  than  2.576  Indicates  significance  at  an  o of  .01 
shown  by  the  values  with  asterisks. 

IV.  DISCUSSION. 

The  stated  purpose  of  this  study  was  to  quantify  the  subjective  evalu- 
ations of  prototype  camouflage  for  the  M60A1  Tank.  The  problem  faced  In 
this  study  was  one  of  obtaining  objective  data  from  facts  that  were  subjec- 
tive In  origin.  Four  by  five  Inch  color  positives  were  obtained  of  the 
M60A1  tank  for  four  conditions  of  prototype  camouflage.  Photographs  of  the 
pattern  painted  tank  were  used  as  the  base  case.  Thirty  operational  II 's 
were  shown  all  of  the  Imagery.  They  used  the  forced  choice  rating  technique 
to  determine  the  effectiveness  of  the  camouflage  conditions  on  a five  point 
scale,  with  one  being  most  effective.  The  mean  and  standard  deviation  were 
determined  for  the  frequency  with  which  rating  values  were  assigned  to  each 
condition  of  camouflage.  The  means  and  associated  standard  deviations  were 
then  subjected  to  the  Z statistic  to  determine  which  condition  of  camouflage 
was  significantly  most  or  least  effective.  The  resulting  data  was  success- 
fully used  to  determine  the  most  promising  candidates  for  further  develop- 
ment. 
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TABLE  I 
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GUN  BARREL  GUN  BARREL 

CAflO  FENDER  DISRUPTER  DISRUPTER  COUNTER 


! EFFECT  WEIGHT 

1 

FOLIAGE 

NETS 

TYPE  I 

TYPE  II 

SHADING 

;i  ! 1 

'1 

5 

27 

1 

0 

2 

0 

;!  ! 2 

1 1 

4 

2 

11 

9 

6 

2 

j 3 

3 

1 

9 
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11 

2 

4 

2 

0 

6 

10 

7 

7 

v 1 

5 

r : 

1 

0 

3 

4 

4 

19 

; MEAN 

■:  STANDARD 

4.87 

3.03 

2.70 

2.83 

1.57 

1 DEVIATION 

.434 

1.066 

1.055 

1.115 

.893 

I 1 


i 


i 


i 


i 


i 

IB. 

I 

C, 
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TABLE  II 

Z OR  STANDARD  SCORES 

FOR  COMPARISON  OF  WEIGHTED  MEAN  CAMOUFLAGE 
EFFECTIVENESS  VALUES 


GUN  BARREL  GUN  BARREL 


FOLIAGE 

FENDER 

NETS 

DISRUPTER 
TYPE  I 

DISRUPTER 
TYPE  II 

FOLIAGE 

FENDER  NETS 

8.72* 

GUN  BARREL 
DISRUPTER  TYPE  I 

15.20* 

1.22 

GUN  BARREL 
DISRUPTER  TYPE  II 

10.54* 

0.71 

0.48 

COUNTER  SHADING 

18.13* 

4.67* 

4.48* 

4.84* 

♦SIGNIFICANT  AT  ■ .01 

Z.005  " 2*576 
« - .01 

7 Xl  ' 

"Z 

■ *2 

2 

[Nl  + N2 


SHADING 


N ■ 30 


r. 


n 


f! 
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V.  SUMMARY  AND  CONCLUSIONS. 

Thirty  operational  II 's  evaluated  tank  camouflage  effectiveness  from 
4"  X 5"  color  positive  aerial  photographs  of  the  camouflaged  M60A1  Tanks. 
The  means  from  a forced  choice  evaluation  of  the  conditions  of  camouflage 
were  objectively  evaluated  by  use  of  the  Z statistic.  The  data  from  Tables 
I and  II,  significantly  (a  ■ ,01),  Indicate  that  the  use  of  natural  foliage 
provides  the  best  camouflage  of  those  evaluated.  The  use  of  countershading 
has  little  or  no  camouflage  effect  or  value.  Fender  nets  and  two  types  of 
gun  barrel  disrupters  were  significantly  (a  ■ .01)  better  than  the  counter- 
shading, but  significantly  (a  ■ .01)  Inferior  to  foliage.  Fender  nets  and 
the  types  of  gun  barrel  disrupter  did  not  differ  significantly  In  camou- 
flage effectiveness  from  each  other.  From  the  results  of  this  study  It  was 
recommended  that  the  use  of  foliage,  fender  nets  and  gun  barrel  disrupters, 
Types  I and  II,  be  subjected  to  additional  development  and  testing.  Count- 
ershading was  not  recommended.  It  Is  also  noted  that  the  use  of  forced 
choice  rating  can  be  very  successful  In  an  objective  evaluation  of  data 
that  Is  subjective  In  origin. 
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ABSTRACT . Of  particular  interest  to  the  Army  is  a reliable  voice  commun- 
ication capability  for  helicopters  that  fly  at  Nap-of-the-Earth  (NOE)  alti- 
tudes. This  flight  regime  at  tree-top  level  or  below  is  necessary  for  air- 
craft survival  in  the  modern  battlefield.  At  these  altitudes,  the  present 
aircraft  VHF/FM  radio  systems  operate  over  only  extremely  short  ranges  and 
are  essentially  limited  to  line-of-slght  (LOS)  paths, 

To  quantitatively  assess  the  performance  and  effectivensB  of  the  nine  can- 
didate radio  systems  (both  VHF/FM  and  HF/SSB)  and  communication  methods,  a 
large  scale  combined  operational  and  engineering  test  was  designed.  The  ex- 
periment design  considered  variables  including  range,  altitude,  terrain,  time 
of  day,  frequency,  and  power  that  affect  the  radio  channel  (SNR).  The  tests 
were  designed  to  determine  how  the  performance  of  the  non-LOS  and  LOS  radio 
systems  depended  on  these  major  variables.  The  test,  conducted  over  a three- 
month  period,  involved  over  100  personnel,  and  1000  hours  of  flight  testing, 
and  utilised  over  10,000  alpha-numeric  (A-N)  test  messages  to  determine  and 
evaluate  the  voice  intelligibility  of  the  radio  systems. 

This  paper  deals  with  a definition  of  the  problem  and  development  of 
measures  of  effectiveness  (MOEs)  to  measure  radio  performance,  the  design  of 
the  experiment,  and  how  the  variables  and  dimensions  of  the  teat  were  treated. 
Although  statistical  principles  were  considered,  a rigorous  statistical  de- 
sign was  not  used;  however,  probability  theory  techniques  were  used  for  ex- 
tension of  the  results  to  other  terrains.  Results  are  briefly  discussedi 
Lessons  learned  from  the  tests  are  also  summarized  with  recommendations  given 
which  could  be  applied  to  future  operational  tests  of  this  nature. 

1.0  INTRODUCTION ; The  Army  is  currently  faced  with  a serious  radio 
communication  problem;  communicating  with  the  helicopter  on  the  modern 
battlefield. 

In  order  to  survive  on  the  modern  battlefield  aircraft  must  fly  close  to 
the  surface  of  the  earth  in  a Ne.p-of-the-Earth  (NOE)  region  [1) , The  NOE 
flight  regime  for  helicopters  is  flying  at  extrsmsly  low  altitudes,  typically 
hover  altitudes,  at  relatively  low  speeds  below  tree-top  level  in  the  battle 
area.  Aircraft  must  fly  at  NOE  altitudes  to  take  maximum  advantage  of  the 
terrain  features  for  cover.  Survivability  and  mission  effectiveness  in  battle 
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area  depend  on  how  well  the  aircraft  and  crew  can  function  under  these  strain- 
ed conditions  and  how  well  communications  can  be  maintained  with  the  elements 
being  supported.  The  problem  of  how  to  effectively  communicate  in  the  battle 
area  while  flying  NOE  resulted  in  the  conduct  of  a full-scale  operational 
field  test  of  nine  different  communication  systems.  The  main  objective  of 
the  test  was  to  compare  and  evaluate  the  communication  effectiveness  of  the 
candidate  radio. systems  under  NOE  conditions.  The  presently  used  tactical 
VHF/FM  radio  system  was  considered  the  baseline  system  for  the  tests. 

Many  variables  existed  for  the  NOE  Communications  test.  Figure  1 shows 
the  major  test  variables., 

Variable  Condition 

Spatial:  Range  Terrain 

Altitude  Siting 

Time  of  Day:  Day  Night 

Dawn 

Frequency  Band /Modulation: 

UF/SSB  (2-8  MHz  Below  MUF) 

HF/SSB  (8-30  MHz  above  MUF) 

VHF/FM  (30-76  MHz) 

Power  Output:  HF  (40,  100,  200,  400W  PEP) 

VHF  (10,  40W) 

System  Configuration 

(Links) : Air-Ground 

Ground-Air 

Air-Air 


FIGURE  1.  TEST  VARIABLES 


These  variables,  and  others,  were  considered  in  the  design  of  the  test 
to  determine  how  communications  range  was  affected  with  aircraft  operating  at 
various  altitudes  in  various  type  terrain  conditions.  The  tests  described  in 
this  paper  were  supplemented  by  other  engineering  tests  and  by  computer  pre- 
dictions of  communications  in  operationally-slgnlficant  areas  such  as  Europe, 
the  Mid-East,  and  Korea. 

2.0  DESIGN  OF  THE  TEST. 

2.1  Measures  of  Effectiveness.  To  comparatively  evaluate  the  per- 
formance of  the  candidate  systems,  two  measures  of  effectiveness  (MOEs)  were 
developed . 

2.1.1  Alpha-Numeric  Test  Messages.  The  first  measure  was  a 
measure  of  communications  effectiveness  using  randomly  selected  alpha-numeric 
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(A-N)  characters  sent  through  the  radio  channel.  Communication  effectiveness 
wa 3 defined  as  the  percent  of  A-N  characters  correctly  received,  sent  one  way 
without  repeats  through  the  communication  channel.  This  measure  provided  a 
quantitative  comparison  of  each  of  the  candidate  radio  systems  as  a function 
of  the  range  and  other  test  variables. 

A test  message  containing  an  equal  number  of  randomly  selected  letters 
and  numbers  was  developed.  This  was  called  an  A-N  test  message.  The  A-N 
test  messages  were  formatted  and  transmitted  as  tactical  spot  reports  by  the 
tester.  The  tester  determined  that  messages  sent  in  this  spot  report  format 
operationally  resemble  grid  or  target  coordinates  that  helicopters  routinely 
transmit  over  radio  systems.  Further,  spot  reports  in  this  format  sent  one 
way  through  the  channel  without  repeats  are  demanding  on  the  communication 
channel.  Finally,  A-N  messages  in  this  format  can  be  practically  recorded  in 
the  helicopter  by  a test  observer  and  graded  at  the  end  of  the  mission. 

Figure  2 shows  a typical  data  recording  sheet.  A word  consists  of  six  random- 
ly selected  A-N  characters.  In  this  message  characters  and  numbers  are  sent 
using  the  phonetic  alphabet.  These  messages  were  copied  down  on  answer 
Bheets  such  as  shown,  graded  and  used  as  the  primary  measure  of  effectiveness 
for  the  tests. 


Plgura  2.  iMfU  Data  lhaat 
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2.1.2  Height  to  Break  Squelch.  The  second  measure  was  the  alti- 
.1  tude  required  to  establish  two-way  communications  from  the  aircraft  to  the 

; base  station.  In  this  case  the  aircraft  would  climb  to  whatever  altitude  was 

' required  to  establish  two-way  communications  to  the  base  station.  This  pro- 

vided an  estimate  of  the  vulnerability  of  the  aircraft  containing  a candidate 
radio  system  to  the  enemy  weapons  threat.  The  units  for  the  measure  would  be 
s’l  the  height,  in  feet,  above  ground  level  (AGL)  required  to  communicate  above 

! an  NOE-situated  site.  This  measure  was  made  for  the  baseline  system  only  and 

not  for  all  of  the  candidates  due  to  testing  time  limitations. 

■ i 
i 

[ 2.2  TeBt  Variables.  Many  variables  affected  two-way  helicopter  conutiun- 

| 1 ications  (Figure  1).  The  principal  variables  were  range,  altitude,  and 

terrain;  next  in  importance  was  transmitter  power  used  in  the  aircraft, 
j Finally  of  importance  was  the.  link  tested.  Links  (or  modes)  are  air-to- 

] ground  (A-G),  ground-to-air  (G-A),  and  air-to-air  (A-A) . Performance  over 

these  links  differs. 

The  range  and  altitude  variables  are  shown  in  Figure  3 in  the  form  of  a 
range/height  matrix.  Rangec  at  which  the  communications  equipment  were  test- 
ed were  selected  to  include  the  failure  range  for  the  candidate  systems,  par- 
ticularly the  VHF/FM  systems  which  operate  under  near  LOS  conditions. 
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FIGURE  3.  RANGE-HEIGHT  CELLS 
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2.2.1  Range . The  range  intervals  at  which  the  communication  sys- 
tems were  tested  were  spaced  logarithmically  at  operationally  significant 
distances  of  1,  2.5,  5,  10,  25,  and  50  km.  Actual  ranges  for  the  test  differ- 
ed slightly  from  these  because  of  terrain  and  military  reservation  boundary 
limitations.  Selection  of  ranges  spaced  at  two  to  one  multiples  of  distance 
results  in  excess  of  10  dB  incremental  basic  transmission  loss  for  a ground- 
wave  signal  between  each  site. 

The  test  ranges  were  selected  to  identify  the  capabilities  and  limita- 
tions of  each  of  two  modes  of  transmission— groundwave  mode  and  near  vertical 
incidence  skywave  (NVIS)  mode  of  propagation.  VHF/FM  radio  systems  operate  in 
the  groundwave  mode  of  communication  in  which  the  launched  signal  generally  fol- 
lows the  surface  of  the  earth  and  is  refracted  or  reflected  by  terrain  irregu- 
larities along  the  path  profile  between  the  transmitter  and  the  receiver. 

Signals  in  the  VHF  portion  of  the  spectrum  (30-76  MHz)  are  attenuated  by  both 
range  and  terrain.  The  test  ranges  of  1-10  km  were  selected  prior  to  th« 
tests  to  bracket  the  expected  failure  range  of  the  VHF  systems  originating  at 
a base  station.  The  HF/SSB  signals  also  propagate  in  groundwave  mode  but  to 
longer  ranges  than  their  VHF/FM  counterpart  radio. 

HF/SSB  radios  have  the  capability  of  operating  both  groundwave  and  in 
near  vertical  incidence  (NVIS)  mode.  For  NVIS  mode  the  energy  is  directed 
from  a horizontal  radiator  to  the  ionosphere  and  returned  to  the  surface  of 
the  earth.  Due  to  NVIS  propagation,  HF/SSB  has  the  capability  to  operate  at 
extended  ranges  independent  of  terrain  effects.  The  25  and  50  km  points  were 
selected  to  investigate  the  communication  performance  of  HF/SSB  radios  in  the. 
NVIS  mode. 


2.2.2  Altitudes . The  altitude  intervals  for  the  test  wore  select- 
ed from  an  operational  standpoint.  Three  altitudes  were  used: 

• Skids  on  ground — this  altitude  defines  the  bottom  of  the 
NOR  flight  regime. 

• NOE  altitude — this  altitude,  approximately  3-ft  AGL  for 
Fort  Hood  terrain,  represents  the  top  of  the  NOE  flight 
regime  in  the  test.. 


• Height  to  break  squelch  altitude — this  is  the  height  above 
ground  to  which  the  aircraft  must  climb  to  establish  two- 
way  communications.  This  altitude  is  operationally  signif- 
icant in  chat  the  aircraft  must  climb  to  it  in  order  to 
comiminicute  to  a remote  base  station.  As  the  aircraft 
climbs  above  the  NOE  regime,  its  vulnerability  to  ground- 
based  weapons  increases, 


t 


As  can  be  seen  from  Figure  3,  the  choices  of  six  ranges  and  three  alti- 
tudes resulted  in  a grid  or  matrix  containing  18  cells.  This  matrix  consti- 
tuted the  sampling  grid. 

2.3  Rampling  Plan.  The  tester  chose  to  use  a factorial  analysis  for 
the  analysis  of  the  test  data  to  relate  the  performance  of  the  candidate 
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radio  systems  and  test  variables  to  the  dependent  variable,  percent  correct 
A-N  score.  A complete  five-factor  analysis  of  variance  was  planned.  [2,3] . 
The  five  factors  were  radio  system,  range,  time  of  day,  altitude,  and  mode  of 
transmission  (A-A,  A-G,  G-A) . A factorial  analysis  is  generally  used  to  de- 
termine the  relationship  among  many  test  variables  and  the  outcome  (A-N 
score).  To  perform  this  analysis  an  assumption  on  the  distribution  of  the 
data  in  required — that  the  data  be  normally  distributed  about  the  mean.  The 
analysis  of  variance  program  run  by  the  tester  revealed  significant  inter- 
actions between  toe  factors  and  also  resulted  in  a large  computed  F-rratio  for 
the  candidate  radio  systems.  On  the  basis  of  these  results,  the  Newman-Keuls 
test  was  run  to  make  pair-wise  comparisons  of  the  mean  A-N  score  of  the  can- 
didate radios  and  to  determine  significance.  This  test  is  also  based  on  the 
Normal  assumption. 

The  decision  to  perform  an  analysis  of  variance  in  this  manner  required 
Iterative  and  equal  sampling  in  each  of  the  range-height  cells  for  each  of 
the  conditions  of  the  variables.  This  resulted  in  multiple  sampling  in  each 
range-height  cell  to  establish  the  required  confidence  levels.  This  approach 
is  not  recommended  for  future  tests  of  radio  Bystems  in  which  the  range  char- 
acteristics of  the  radio  systems  can  be  estimated  from  propagation  models. 

A sequential  sampling  is  more  appropriate  for  a test  program  of  thiB 
nature.  Under  this  plan,  samples  would  be  taken  at  each  of  the  range-alti- 
tude cells,  only  until  the  communication  effectiveness,  mean  A-N  test  score 
In  this  case,  could  be  estimated  with  a 95%  confidence  level.  The  number  of 
samples  required  in  each  cell  is  dependent  on  the  mean  score  and  confidence 
level  required.  Sequential  sampling  is  desirable  to  conserve  expensive  teat 
resources  and  to  redirect  those  resources  (helicopters)  to  investigate  other 
aspects  of  the  NOE  communication  problem.  A comparison  of  the  two  sampling 
approaches  is  shown  in  Figure  A. 


FIGURE  <4.  COMPARISON  OP  FOUAl -OCCURRENCE  AND  SEQUENTIAL  SAMPLING  PLANS 
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The  sequential  sampling  approach  was  prepared  but  was  not  used  during 
the  tests.  This  approach  Implies  the  use  of  a channel  utility  estimator  and 
is  more  operationally  oriented;  simply  stated:  Does  the  system  work?  To 
frame  this  objective.  Does  the  channel  work?,  we  must  first  define  some  quan- 
titative measure  of  the  term  work,  This  can  be  done  arbitrarily  in  percent 
of  messages  that  can  be  correctly  received  in  a particular  test  environment, 
but  it  must  be  decided  on  before  the  tests  are  begun.  This  can  be  achieved 
by  prefield  tests,  such  ss  screenroom  tests  run  on  radio  systems  using  the 
test  material,  by  experienced  judgement,  or  by  both  methods.  As  an  example, 
an  A-N  score  of  80%  may  be  a reasonable  threshold  between  channel  accept- 
ability and  unacceptability. 

A three-level  hypothesis  testing  procedure  was  proposed  for  sequential 
sampling: 

• Take  N samples  of  communication  performance  on  the  channel. 

• Form  an  unbiased  test  statistic,  based  on  the  performance  measured. 

• Use  this  statistic  to  accept  one  of  the  following  hypotehses: 

HI.  The  channel  can  support  communications. 

H2.  The  channel  cannot  support  communications. 

H3.  Cannot  be  determined.  More  samples  required. 

The  expected  results  of  such  a sampling  plan  are  shown  in  Figure  5. 

Teat  thresholds  and  confidence  levels  required  were  determined  a priori  to 
the  experiment.  Channel  quality  Is  measured  at  the  required  confidence  level 
and  by  using  repetitive  samples.  In  Figure  5,  G indicates  a good  channel,  B 
a bad  channel,  and  no  entry  indicates  more  samples  are  required.  It  is  pro- 
posed that  once  //I  and  P 2 has  been  accepted,  then  measurements  under  these 
test  conditions  will  be  terminated.  In  this  mannner,  experimental  resources 
can  be  concentrated  In  areas  where  communication  performance  is  at  or  near 
the  critica;i  value. 

FIGURE  5.  EXAMPLE  OF  SEQUENTIAL  SAMPLING  PLAN  RESULTS 
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For  this  approach,  confidence  levelB  were  determined  using  the  binomial 
distribution  based  on  independent  Bernoulli  trials. 


P<*>  “ <$>  pxq  n"x 

where 

P(x)  » Probability  that  exactly  x correct 

characteristics  received  in  n independent  trials 

a 

p ■ Expected  probability  character  correctly 
received  (preselected  threshold  or  desired 
probability) 


Suppose  we  wish  to  test  the  hypothesis  that  the  channel  ia  acceptable 
(p  • 0.8),  using  ten  transmitted  characters  (n  - 10),  of  which  three  are 
correctly  received  (x  - 3). 

r<3>  * <l£)  pxq  10‘x 

- 7.865  x 10~4 

Under  these  conditions,  the  probability  of  receiving  3 characters  correct* 
ly,  assuming  that  the  true  (desired)  probability  ia  0,8,  la  approximately 
10-3,  or  0.1%.  Hence  we  can  reject  the  hypotheais  that  the  channel  ia  accept- 
able with  confidence,  Q,  where 

Q - 1 - 7.864  x 10"4 

- 99.92%. 

In  informal  correspondence,  the  test  officer  for  FM-320  estimated  that 
for  mean  A-N  test  scores  used  in  the  field  with  helicopters,  85%  w.tb  accept- 
able without  repeats,  75%  acceptable  with  repeats,  and  less  than  70%  unaccept- 
able [5]  . 

2.4  Tact  Implementation.  A detailed  test  plan  was  developed  by  the 
TRAD0C  Combined  Arms  Test  Activity  (TCATA)  to  implement  the  test  at  Fort  Hood, 
Texan.  This  plan  is  extremely  complex  and  is  a tribute  to  the  TCATA  or- 
ganization. The  test  involved  six  helicopters  visiting  the  18  range  height 
cells  in  approximately  1 hour  and  30  minutes — the  time  duration  for  their 
fuel  load.  At  each  cell  the  helicopter  crew  was  required  to  send  and  receive 
an  A-N  message  from  a ground  atation  or  another  aircraft,  Thia  was  done  in 
three,  2-hour  intervals  in  each  24-hour  period:  at  night,  during  dawn,  and 
during  the  daytime  hours.  Over  10,000  A-N  messages  wore  transmitted,  and  re- 
ceived, and  graded  during  the  duration  of  the  tests. 

To  handle  the  data  generated  by  the  large  volume  of  messages,  TCATA  used 
« remote  terminal,  similar  to  a time-sharing  terminal,  to  enter  the  A-N  test 


scores  into  a central  computer.  The  computer  was  an  Army-owned  CDC-6500  com- 
puter located  at  Fort  Leavenvotth,  Kansas,  The  test  scores  were  entered  at 
the  end  of  each  day.  This  type  of  data  handling  system  le  strongly  recommend- 
ed for  any  future  tests  having  large  amounts  of  data.  The  system  has  a 
number  of  distinct  advantages: 

e Mean  A-N  teat  scores  for  each  of  the  candidate 
radio  systems  ware  computed  and  updated  daily, 

• Cumulative  results  were  available  in  real-time  to 
the  test  officer  and  others  at  TCATA  interested  in 
the  progress  of  the  testa. 

•It  permits  ongoing  analysis  of  the  tests  results 
on  the  basis  of  these  results,  and  allows  room  for 
redirection  of  resources, 

Figure  6 (extracted  from  the  TCATA  FM-320  Report)  [2]  shows  an 
accumulative  output,  called  a Table  of  Means  for  each  candidate  radio  uystem 
tested.  This  table  was  prepared  daily  for  use  by  the  test  officer  to  evaluate 
performance  and  to  plan  tests  for  the  succeeding  days. 


rtuuiut  «.  PERCENT  Of  COMMUNICATION  EFFECTIVENESS  AT  NAP-OF- WE -EARTH  ALTITUDES  AT 
ALL  RANGES 
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3.0  RESULTS.  Figure  7 is  a plot  of  successful  communications  completed 
versus  range  for  an  A-N  score  equal  to  or  greater  than  80%  for  Fort  Hood  type 
of  terrain.  Success  hers  is  defined  as  an  A-N  score  greater  than  or  equal  to 
80%.  The  ordinate  shows  the  percentage  of  the  time  that  this  score  was  equal 
to  or  exceeded.  The  data  used  to  generate  these  curves  were  extracted  from 
the  TCATA  FM-320  tests  [2]  for  aircraft  flying  at  two  altitudes  (skids-on- 
ground  and  a low  hover),  three  time  periods,  and  three  communication  modes 
(aircraft-to-ground,  gvound-to-aircraf t,  and  ilrcraf t-to-aircraf t) . 

The  curves  show  the  advantage  of  an  Improved  FM  system  over  baseline 
and  the  improvement  of  a high  power  HF  system  (400W  equivalent)  over  a low- 
power  HF  system  (40W).  For  multipower  radios,  the  minimum  power  setting 
should  be  used  to  achieve  acceptable  communication  quality  nt  the  required 
range.  Switching  points  for  an  A-N  score  of  85%  are  indicated  on  Figure  7, 
For  additional  and  detailed  analysis  of  this  type,  the  reader  is  directed  to 
reference  [6], 

Figure  7 

TYPICAL  COMMUNICATIONS  SUCCESSFULLY 
COMPLETED  VS  RANGE  FOR  AN  A-N  SCORE  > 80% 

FOR  FT  HOOD  TYPE  TERRAIN  (FM-320) 

(More  Favorable 


Range  (km) 


In  summary,  the  following  information  was  determined  from  the  tost: 

A comparison  of  A-N  scores  for  the  nine  systems  at  Bix 
test  ranges. 

The  dependence  of  system  performance  on  the  teBt  variables. 

The  relationship  of  the  technical  characteristics,  such  n h 
power  output,  to  rommuni  rat  ions  performs-nre, 
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Areas  for  improvement  for  the  combined  aircraft  and  ground 
station  communication  system. 

4.0  LESSONS  LEARNED.  From  the  planning,  conduct,  and  analysis  of  the 
tests,  a number  of  lessons  were  learned  that  may  be  applicable  to  operational 
(and  engineering)  tests  that  the  Army  conducts  in  the  future. 

4.1  Measures  of  Effectiveness,  The  measure  of  effectiveness  (MOE) 
selected  to  evaluate  a communication  system  should  be  operationally  signifi- 
cant and  mission-related,  A measure  of  effectiveness  must  have  three  char- 
acteristics. First,  it  must  be  measurable  in  the  field.  Second,  it  must  be 
quantitative.  Third,  it  must  measttre  to  what  degree  the  objective  is 
achieved.  The  MOE  should  have  operational  significance  to  decision  makers. 

4.2  Sampling  Plan  and  Statistical  Analysis.  If  the  data  for  the 
teat  are  to  be  analyzed  statistically,  it  is  recommended  that  the  assumptions 
on  the  forms  of  the  expected  distribution  of  the  data  be  carefully  reviewed, 
and,  if  possible,  be  checked.  For  the  FM-320  data,  the  tester  assumed  that 
the  A-N  test  scores  would  be  normally  distributed  about  the  mean,  This  did 
not  prove  to  be  the  case.  A sampling  plan  should  be  designed,  written,  and, 
if  possible,  tested  before  implementation  of  a full-scale  test.  The  choice 
of  an  appropriate  confidence  level  and  the  number  of  samples  required  under 
each  set  of  variable  conditions  to  achieve  that  level  should  be  determined. 
The  consequences  of  acquiring  insufficent  data  (insufficient  samples)  should 
be  investigated.  Distribution  free  techniques  should  be  used  to  estimate  the 
required  sample  sizes  for  this  type  of  test  where  the  distribution  form  can- 
not be  known  In  advance.  Finally,  the  sampling  plan  should  allow  for  test 
flexibility  and  redirection,  if  trends  in  the  data  so  warrant, 

4.3  Pretest  Planning  and  Other  Recommendations.  The  importance 
of  pretest  planning  cannot  be  overstressed.  It  is  important  to  review  and 
exchange  information  among  all  test  participant  agencies  and  to  change  the 
design  of  the  test  if  early  results  so  warrant.  Real-time  data  input  and 
access  are  recommended  for  tests  having  large  volumes  of  data.  Finally,  pre- 
dictions or  theoretical  modeling  should  be  accomplished  before  the  start  of 
the  test  and,  if  possible,  be  validated  as  part  of  the  test  procedure.  This 
approach  Is  mandatory  for  sequential  testing.  The  test  plan  should  obtain 
data  for  at  least  a spot  check  of  any  models  which  will  later  be  used  to 
extrapolate  the  test  results  to  other  situations. 

5.0  CONCLUSIONS.  The  results  of  an  operntional/engineering  teat  will 
be  only  as  good  as  the  planning  inputs,  the  implementation  of  the  test  plan, 
and  the  analysis  and  reporting  of  the  results. 

i 

! In  the  fall  of  1976,  a large  scale  NOE  communication  test  was  performed 

by  TCATA,  which  required  50-60  personnel,  used  1,000  hours  of  helicopter 
■ flight  time  with  6 aircraft,  and  which  used  10,000  alpha-numeric  messages, 

i This  test  was  performed  by  TCATA  with  test  Inputs  from  the  U.S,  Army  Avionics 

i Research  and  Development  Activity  and  U.S.  Army  Aviation  Center  to  evaluate 

i comparatively  nine  candidate  radio  communication  systems. 

[ 

J.  The  results  of  this  test  supplemented  by  additional  analysis  and  com- 

puter  predictions  were  a determining  factor  in  the  selection  of  a NOE  radio 
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system  for  U.S.  Army  helicopters.  The  system  selected  will  provide  accept- 
able air-to-air,  air-to-ground,  and  ground-to-air  communications  for  helicop- 
ters operating  in  the  NOE  flight  regime,  and  will  represent  a significant 
improvement  over  the  present  communications  capability  of  Army  helicopters. 
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ABSTRACT.  Results  obtained  since  a paper  of  the  same  title  was  presented 
at  the  Twenty-Second  Army  Conference  on  Design  of  Experiment  are  described. 
An  Improved  table  look-up  algorithm  and  more  refined  error  norms  are  used. 
Comparison  of  the  generator  with  several  others  Is  made. 

1.  INTRODUCTION.  A paper  with  the  same  title  was  presented  at  the  Twenty- 
Second  Amy  Conference  on  Design  of  Experiment  (Shepherd  and  Hynes  [1]). 

We  now  present  some  results  obtained  since  then.  Some  duplication  will, 
of  course,  occur. 


With 

P(t)  - }+  /J  e'v2/2dv,  (1.1) 

/2ir 

G(t)  • P'^t), 


and  {u  1 any  output  sequence  for  a uniform  random  number  generator  with  den- 
sity function  equal  to  1 over  [0,1]  and  0 elsewhere,  the  sequence  (G(u)  } 
can  be  thought  of  as  the  output  sequence  of  an  n(0,l)  random  number  genera- 
tor (Abramowltz  and  Stegun  [2],  page  950). 

A difficulty  In  using  this  Idea  Is  In  the  computation  of  G(u).  The  earlier 
report  described  an  Interpolating  quadratic  spline  which,  once  constructed, 
alleviates  the  difficulty  and  approximates  G(u)  to  within  a prescribed  accu- 
racy. We  now  describe  a somewhat  Improved  procedure  for  constructing  the 
spline.  Blair,  Edwards,  and  Johnson  [3]  furnished  us  with  faster  computa- 
tion of  the  norm  of  the  error,  which  In  turn  allowed  a finer  determination 
of  the  norm.  At  the  same  time,  more  compact  storage  of  the  coefficients 
was  devised.  Our  experience  with  some  uniform  random  number  generators  Is 
presented,  and  the  results  of  some  statistical  tests  are  given.  The  normal 
random  number  generator  Is  compared  with  some  others. 


2.  THE  SPLINE  APPROXIMATION  FOR  G(t).  From  symmetry  of  { ( t , G( t ) ) } about 
(*r,  a (y)),  we  need  consider  only  y s t < 1.  First  consider  the  knots 


< t, 


2N 


1. 


(2.1) 
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A continuously  differentiable  quadratic  spline,  x,  with  knots  {t^}^BQ 
can  be  represented  by 

x(t)  * x(tg^+i ) + x*  (tg^+i ) ( tg-^ + ■) ) + ? *"^21+1  ) C "t- "^21+1 ) (2.2) 

for  t21  i t * t21+i  and 

x(t)  ■ *(^21+1)  + x*  ^21+1  ^ ^”^21+1  ^ + IT  x"^21+1  )(^“^2i+l^  (2.3) 

for  *21+1  S t '*  ^21+2* 

1 - 0,  1 N-l . 

It  can  be  shown  that 

x(t21)  ■ G(t21),  x'(t21)  ■ G'(t21),  for  1 «■  0,  1,  ....  N (2.4) 

if  and  only  If 

X^t21+1^  * t2i  J-t2i  ^*21+2^  ‘ G^21^  " 7 ^21+2  * ^1+1^'  ^21+2^ 

“ 7 ( tg^+i ~^2^ )G‘ ( t2^ )] , (2.5) 

x(^21+1  ^ * ^(t2^)  + -g  ( ^21+1  "^2i  * ^21  ^ + X '^21+1  ^ ’ (2.6) 

x"(t2i+l  ^ " t^T27  (x'(t2i+lJ  ' G'(t21^’  (2,/) 

x"(t2i+i  ) - t21+2-t2i+1  ^'^21+2^  ' x'^t?1+l^*  (2*8^ 


I 


Since  G(1)  ■ ®,  we  must  choose  t^  < 1.  Hence  In  extending  x over  [t2N*l], 
we  depart  from  Interpolation  and  require  that 


*( tgw)  * ®( tgN ) • x ' (^2N^ 

B G'(t2N). 

(2.10) 

/l2N+l  x(t)dt  ■ /t2N+l 
*2N  *2N 

G(t)dt  :B  A.j, 

(2.11) 

/I  x(t)dt  - fl 

l2N+1  l2N+1 

G(t)dt  :■  A2» 

(2.12) 

t2N+l  “ 1 ^ + t2N^ " 

(2.13) 

A^  and  Ag  can  be  evaluated  by  the  formula 

fb  G(t)dt  - -L  . e-ITG{b)Jz/2K 

a /sr 


(2.14) 


((2.14)  can  be  obtained  by  the  change  of  variables  t ■ P(u).) 

With  a :■  (1-t2\)»  the  conditions  (2.10),  ....  (2.13)  are  equivalent  to 


x"(t2N+P  " 1 (A1  ' ’ ? G'(t2N^‘)* 

A 

x'^t2N+l^  " G'^t2N^  + x"^t2N+l 

X(^9N4.l)  ° ®(toM)  + G 1 ( t.oiu ) A + -k  X"(t9N.-|  ) A , 


2N+1 ' 

>*.  +\  _ 6 , 


? * v C2N+1 

. 1 ^ ‘ I 4- 


x"(t2N+l  > " -T  (A2  - X^2N+1)A  “ ?'x'(t2N+l)A  > 

A 


(2.15) 

(2.16) 

(2.17) 

(2.18) 


Figure  1 illustrates  G(t)  and  x( t)  for  t2N  s t < 1.  With  this  extension* 

x(t)  Is  a simple  quadratic  spline  over  1].  Table  1 gives  the  4(N+1) 
coefficients  corresponding  to  the  odd  numbered  knots. 

3.  THE  KNOT  SEQUENCE  AND  SEARCH  ALGORITHM.  We  now  turn  to  the  determlna- 
tlonDTa  suitable  set  of  knots.  This  set  of  knots  must  be  chosen  with  a 
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Table  1.  Center  Knot  Values  and  Coefficients  for  the  Spline  Approximation 
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.995536  2.6147976  76.246729  6613.0692  8795.6899 
.996547  2.7013401  95.738746  10356.157  14967.435 
.998500  2.9614552  175.59911  22099.942  213300.23 


number  of  things  In  mind.  The  accuracy  of  our  approximation  depends  on  the 
knot  spacing.  The  efficiency  of  our  search  algorithm  In  the  table  look-up 
depends  on  the  exact  placement  of  the  knots,  and  the  amount  of  storage 
depends  on  the  number  of  knots  used.  These  three  considerations  will  be 
used  simultaneously  to  obtain  our  knot  sequence. 

The  accuracy  to  which  we  wish  to  approximate  6(t)  depends  very  much  on  the 
uniform  generator  used  and  the  machine  on  which  this  algorithm  Is  to  be 
Implemented.  The  most  common  (and  most  efficient)  type  of  generator  Is  the 
linear  congruentlal  type.  Knuth  [4],  chapter  3,  presents  an  excellent  dis- 
cussion of  the  linear  congruentlal  generator  as  well  as  some  alternatives. 

It  has  been  shown  (Coveyou  and  MacPhereson  [5]  and  Knuth  [4])  that  In  a 35- 
bit  word  (as  In  the  case  of  the  UNIVAC  1108)  using  a linear  congruentlal 
generator  one  can  expect  to  have  successive  pairs  of  numbers  Independent 

only  to  an  accuracy  of  about  10'4.  Successive  K-tuples  for  K > 2 are  Inde- 
pendent for  even  smaller  accuracies.  It  would  be  wasteful  of  effort  to 
approximate  G(t)  to  any  greater  accuracy  for  use  with  this  uniform  genera- 
tor. It  should  be  noted  that  the  generator  developed  here  Is  not  suited 
for  use  In  high  resolution  applications.  If  a greater  accuracy  Is  needed  ! I 

and  a suitably  accurate  generator  Is  obtained,  a new  knot  sequence  could  be  j : 

formed  to  make  the  spline  sufficiently  accurate.  j 

i j 

The  search  algorithm  determines,  for  any  given  t,  a value  of  j so  that  ’ ; 

s t < t2j+2>  Instead  of  using  some  binary  search  technique  or  a Fibo-  ' 

naccl  search,  It  was  discovered  that  if  we  placed  the  knots  carefully  we  ! 

could  very  simply  compute  an  Index  from  the  value  of  t and  then  look  up  the 

value  of  j In  a table  using  this  Index.  j j 

Let  us  choose  the  knots  so  that  each  even  Indexed  knot  Is  a multiple  of  .01  1 

and  also  so  that  the  maximum  error  over  each  Interval  [ty,  t2j+2J  Is  less 

than  or  equal  to  10~4.  Further,  we  want  each  interval  as  long  as  possible 
to  minimize  the  number  of  knots.  This  gives  us  the  values  listed  In  table  2.  I 

Note  that  at  .97  It  Is  no  longer  possible  to  maintain  an  accuracy  of  10"4 
and  a minimum  spacing  of  .01.  For  any  t in  [.5,  .97]  let  the  Index  I.  be 
given  t , 

It  - IJOOtJ  - 49  (3.1)  !; 

This  is  very  simple  and  fast  to  compute  in  FORTRAN.  It  Is  the  Index  of  the 

Interval  of  length  .01  In  which  t Is  to  be  found.  Since  all  of  the  knots 

are  on  the  boundaries  of  the  Intervals,  we  can  look  up  In  a table  exactly  i 

which  knot  Interval  to  which  this  Index  belongs.  j j 

For  t > .97  we  have  a problem.  The  value  .97  Is  not  close  enough  to  1 to  : i 

use  the  equal  area  criterion  on  this  last  Interval,  so  we  need  more  knots.  1 


t‘-  A 4 3 

I 


L 


Table  2 


Table  3 


i 


Starting  at  .97  we  let  the  knots  be  multiples  of  .001.  (See  table  3.) 

We  then  use  the  Index 

It  - |J000tJ  - 976  (3.2) 

and  look  up  the  knot  interval  In  a second  Index  table.  At  the  value  .997 

we  can  no  longer  maintain  the  accuracy  10"^  and  the  spacing  .001.  At  this 
point,  we  use  the  equal  area  condition  on  the  Interval  [.997,  1], 

At  the  cost  of  extra  storage  and  one  further  test,  we  could  have  formed  a 
third  sequence  of  knots  starting  at  .997  with  a spacing  of  .0001.  We  feel 
that  the  gain  In  accuracy  near  1 does  not  justify  the  extra  cost  of  40  words 
of  storage  and  one  extra  test. 

We  should  mention  here  exactly  what  we  mean  by  maximum  error  and  how  we  com- 
pute the  knots.  To  compute  the  maximum  error  over  an  Interval,  we  compute 
the  absolute  difference  between  our  approxlmant  and  an  accurate  rational 
approximation  at  100  equally  spaced  points  In  the  Interval.  (See  Blair, 
Edwards,  and  Johnson  [3].)  The  odd  Indexed  knot  t2j+1  Is  chosen  Inside  the 

Interval  to  an  accuracy  of  .IX  of  the  Interval  length  to  mlmlmlze  the  maxi- 
mum error  over  the  Interval.  Storting  with  tg  ■ ^ , for  j » 0,1,2,.. . ,n-l 
we  compute  t2j+1  and  t2j+2  simultaneously  to  give  the  largest  Interval  so 
that  (1)  the  maximum  error  is  minimized  with  respect  to  placement  of  the 

center  knot,  (2)  the  maximum  error  Is  less  than  or  equal  to  10“*,  and 
(3)  the  Interval  length  Is  a multiple  of  .01. 

The  search  algorithm  is  the  following: 


1. 

Input  t (t  Is  In 

[.5.  1]) 

2. 

If  t > .97,  skip  to 

5 

3. 

It  a IJOOtJ  - 49 

4. 

Return  j « Ml ( I t) 

(see 

table 

4) 

5. 

It  " IjOOOtJ  - 976 

6. 

Return  j ■ M2( I t) 

(see 

table 

5) 

4.  A SPECIFIC  GENERATOR  AND  STATISTICAL  TESTING.  In  this  section,  we  study 
a specific  generator  and  present  some  empirical  statistical  tests.  The  tests 
are  designed  to  study  the  distribution  and  serial  correlation  of  the  sequences 
generated  by  our  algorithm.  We  choose  a particular  linear  congruentlal  uni- 
form generator  and  use  the  tables  presented  here  to  form  our  generator.  The 
uniform  generator  chosen  was  designed  for  a 35-bit  Integer  word,  and  all  of 
the  tests  were  performed  on  a UNIVAC  1108  computing  system. 
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The  uniform  generator  used  Is  of  the  form 


U. 


(AU„  + C)  mod  m, 


(4.1) 


n+1  '""n 

where  m « 235  and  A and  C are  chosen  to  give  the  uniform  generator  good 
statistical  properties. 

0 < 5 235  - 1. 


The  numbers  IL  are  the  Integers  In  the  range 
a-  n oc 

J5  1 To  obtain  a value  In  (0,  1)  simply  divide  Up  by  2 . 


The  multiplier  A Is  chosen  to  obtain  good  results  In  the  spectral  test. 
(See  Knuth  [4]  and  Coveyou  and  MacPhereson  [5].)  Coveyou  and  MacPhereson 
present  several  values  of  A and  the  results  of  the  spectral  test  for  each. 
We  choose  from  their  results  the  value  A ■ 27214903917.  Knuth  presents 
the  following  criterion  for  choosing  C.  To  minimize  the  serial  pairwise 
correlation  over  the  entire  period,  let 


_/ 1 , & 
m(^r  ± -ft)  * 


(4.2) 


where  m ■ 235  and  C Is  odd.  We  choose  the  minus  sign  which  gives  the  value 
C - 7261067085. 

The  first  test  performed  on  the  normal  generator  Is  the  Kolmogorov -Smirnov 
test.  (See  Knuth  [4].)  This  test  studies  the  distribution  of  sequences 
obtained  from  the  generator.  The  empirical  distributions  of  sequences  of 
modest  length  (1000  numbers)  are  compared  with  the  normal  distribution. 

The  maximum  positive  deviation  (K+)  and  maximum  negative  deviation  (K“) 
are  determined  for  each  sequence.  The  distribution  of  the  values  of 

K+  and  of  K‘  should  be  close  to  the  Kolmogorov -Smirnov  distribution.  The 
deviations  of  these  distributions  from  the  Kolmogorov-Sml rnov  distribution 
are  well  within  the  confidence  limits  set  forth  by  Knuth.  Figure  2 shows 
these  empirical  distributions  and  the  Kolmogorov-Sml rnov  distribution. 

The  second  test  Is  a measurement  of  the  serial  correlation  for  normally 
distributed  sequences.  We  compute  the  following  statistic  for  serial  cor- 
relation 


N(X, X2  + X2X3  + ...  + XNX1)  - 


n — ; 

V 

j«i  J 


( l V 
j»i  J 


(4.3) 
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Anderson  [6]  has  shown  that  for  a truly  random  sequence  of  numbers  the 
distribution  of  the  serial  correlation  coefficients  Is  for  a large  N 

asymptotically  normal  with  mean  r-V  and  variance  — — =-*•  . Figure  3 shows 

N-1  (N-1; 

a comparison  of  the  empirical  distribution  of  the  serial  correlation  coef- 
ficients of  50  sequences  of  1000  numbers  with  the  normal  distribution  with 

1 ago 

mean  and  variance  . The  agreement  Is  quite  good. 

9g9  ‘ 

5.  COMPARISON^ WITH  OTHER  ALGORITHMS.  We  now  compare  our  algorithm  with 
two  other  popular  algorithms  In  terms  of  speed,  storage  requirements,  and 
ease  of  programming.  There  are  many  algorithms,  and  a discussion  of  most 
of  the  algorithms  In  use  can  be  found  In  a paper  by  Ahrens  and  Dieter  [7]. 
The  two  we  choose  here  are  probably  the  most  commonly  used. 

One  of  the  most  popular  algorithms  Is  the  polar  algorithm.  This  Is  a modi- 
fication by  Marsaglla  of  the  Box-Muller  algorithm.  The  algorithm  requires 
one  floating  divide,  one  square  root,  and  one  natural  logarithm  to  generate 
two  random  numbers.  It  also  requires  approximately  2.5  uniform  random  num- 
bers to  generate  two  normal  random  numbers.  The  algorithm  requires  vevy 
little  storage  and  Is  very  easy  to  program.  The  difficulty  with  this  algo- 
rithm Is  speed.  The  special  function  calls  are  very  expensive. 

Marsaglla,  MacLaren,  and  Bray  [8]  present  a faster  algorithm  (the  rectangle- 
wedge-tall  algorithm),  which  Is  based  on  the  decomposition  of  the  normal 
distribution  Into  simple  distributions.  This  algorithm  Is  very  fast,  but 
requires  much  extra  storage  for  tables.  Further,  to  take  full  advantage 
of  the  speed  of  this  algorithm,  It  should  be  programmed  In  machine  language. 
There  Is  no  question  that  a machine  language  version  of  this  algorithm  Is 
the  fastest  available;  however,  the  difficulty  of  programming  makes  this 
algorithm  somewhat  Inaccessable. 

Our  algorithm  (the  Inverse  distribution  algorithm)  requires  some  extra  stor- 
age for  tables.  The  amount  required  Is,  however,  considerably  less  than  the 
rectangle-wedge- tall  algorithm.  A FORTRAN  Implementation  of  our  algorithm 
Is  also  faster  than  a FORTRAN  Implementation  of  the  rectangle-wedge- tall 
algorithm  and  Is  considerably  faster  than  the  polar  method.  Table  6 shows 
approximate  times  for  the  generation  of  one  number  with  FORTRAN  Implementa- 
tions of  each  of  the  algorithms  and  the  approximate  amount  of  extra  storage 
required.  The  timings  were  made  on  the  UNIVAC  1108  with  the  FORTRAN  V com- 
piler. 


Algorithm 

Table  6 

Time 

i 

i 

Storage 

Polar 

102  usee 

« » 

Rectangle-Wedge-Tall 

82  usee 

707  words 

Inverse  Distribution 

70  usee 

176  words 
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6.  CONCLUSIONS.  We  have  presented  an  algorithm  for  the  generation  of  nor- 
ma 1 lyaTstnbuted  random  numbers.  This  algorithm  Is  designed  to  be  Imple- 
mented In  a high-level  programming  language  such  as  FORTRAN.  Compared  with 
other  good  algorithms  In  FORTRAN  Implementations,  our  algorithm  Is  the 
fastest  and  requires  only  a modest  amount  of  storage.  Because  this  algo- 
rithm Is  to  be  programmed  In  FORTRAN,  It  Is  portable.  One  must,  of  course, 
obtain  a uniform  generator  that  Is  designed  for  a given  machine;  however, 
the  Inverse  distribution  calculation  Is  entirely  machine  Independent.  The 

Inverse  distribution  approximation  Is  accurate  to  10"4;  however,  the  uniform 
numbers  are  at  best  Independent,  to  four  places.  A greater  degree  of  accu- 
racy Is  unnecessary  and  would  materially  add  to  the  number  of  knots  which 
effects  both  the  efficiency  and  storage  requirements.  This  method  should 
be  used  In  any  application  not  requiring  high  resolution  where  ease  of  pro- 
gramming and  speed  are  Important  and  storage  Is  not  critical.  This  method 
can  be  used,  with  the  appropriate  table,  to  generate  random  sequences  from 
any  continuously  differentiable  distribution  function. 
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IN  MARKOV  CHAINS 


Richard  M.  Brugger 
Quality  Evaluation  Division 
Product  Assurance  Directorate 
US  Army  Armament  Materiel  Readiness  Command 
Rock  Islandi  Illinois 


ABSTRACT.  Some  procedures  for  solving  for  steady  state  probabilities 
are  more  complicated  than  necessary.  This  paper  shows  that  by  not  intro- 
ducing the  equation  reflecting  that  the  sum  of  steady  state  probabilities 
is  one  into  the  matrix  solution,  the  work  becomes  easier. 

I.  INTRODUCTION.  This  paper  deals  with  the  matter  of  determining 
steady  state  probability  expressions  for  Markov  chains.  In  particular, 
it  deals  with  the  matter  of  working  with  the  set  of  equations  from  which 
the  steady  state  probability  expressions  are  derived. 

Aa  is  well  known,  Markov  chain  methodology  is  often  useful,  and  is 
sometimes  the  only  methodology  available,  for  dealing  with  certain  types 
of  problems  related  to  such  applications  as  determining  sampling  plan 
properties  or  analyzing  the  characteristics  of  a weapons  system. 

The  motivation  for  this  paper  arose  from  a training  course  in  which 
the  author  was  enrolled,  In  this  training  course,  a method  of  solution 
for  the  steady  state  probability  expressions  was  presented  which  was 
much  mora  complicated  than  tha  method  which  I had  been  using.  Reviewing 
some  of  the  more  well-known  textbooks  that  included  material  on  Markov 
chains,  it  was  noted  that  mathematical  concepts  of  solution  were  presented, 
but  generally  no  algorithms  were  provided  to  carry  out  these  mathematical 
concepts.  This  paper,  then,  without  benefit  of  references,  will  provide 
the  algorithm  from  the  training  course  and  a simpler  algorithm  that  the 
author  has  been  using  for  some  time.  This  simpler  algorithm  may  not  be 
well  known,  since,  as  mentioned,  the  better  known  textbooks  on  Markov 
chains  tend  to  avoid  detailed  descriptions  of  algorithms. 

Throughout  the  paper,  ergodic  chains  only  are  considered. 

II.  THE  LONGER  METHOD.  As  an  example,  consider  the  chain  represented 
by  the  matrix  in  Figure  1. 
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In  this  chain,  p + q *■  1.  Let  P(Sj)  represent  the  steady  state 
probability  of  state  j.  From  the  matrix,  proceeding  column  by  column, 


we  can  extract  the  following  set  of  equations: 

P(S1)  - pP(Sl)  + pP(S3)  (1) 

qP(Sl)  - pP (S3)  - 0 (2) 

P(S2)  - (q/2)P(Sl)  + PP(S2)  (3) 

(1/2)P(S1)  - P(S2)  - 0 (4) 

P(S3)  - (q/2)P (SI)  + qP(S2)  + qP(S3)  (5) 

<q/2)P(Sl)  + qr(S2)  - pP(S3)  - 0 (6) 

Taking  equations  (2),  (4),  and  (6)  from  above  and  taking  into 
account  that  the  sum  of  the  steady  state  probabilities  is  equal  to  one, 
we  have  the  following  set  of  linear  equations: 

qP (SI)  - pP  (S3)  - 0 (2) 

(1/2)P(S1)  - P(S2)  « 0 (4) 

(q/2)P(Sl)  + qP(S2)  - pP(S3)  - 0 (6) 

P(S1)  + P(S2)  + P (S3)  - 1 (7) 


We  shall  see  later  that  including  equation  (7)  at  this  time  was 
not  wise. 

We  have  a set  of  four  equations  in  three  unknowns,  and  we  know  that 
since  the  chain  is  ergodic,  exactly  two  of  the  equations  can  be  trans- 
formed into  linearly  dependent  equations,  thus  producing  a degeneracy 
which  in  effect  reduces  the  set  of  equations  to  three  linearly 
independent  ones. 

The  process  of  working  with  these  equations  to  attain  this  degeneracy 
can  be  done  in  a variety  of  ways.  A standard  approach  is  the  so-called 
sweep  out  method,  which  we  shall  use  here. 


P(S1) 

- (p/q)P(S3) 

- 0 

(8) 

P(S1) 

- 2P(S2) 

- 0 

(9) 

P(S1) 

+ 2P(S2)  - 2 (p/q)P (S3) 

- 0 

(10) 

P(S1) 

+ P(S2)  + P (S3) 

- 1 

(11) 

it 

* 

i- 
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Subtracting  (8)  from  (9),  (10),  and  (11),  we  obtain: 
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P(S1)  - (p/q)P(S3) 

- 0 

(12) 

A 

- 2P(S2)  + (p/q)P (S3) 

- 0 

(13) 

r 

i 

2P(S2)  - (p/q)P(S3) 

- 0 

(14) 

i 

P(S2)  + (1  + (p/q))P(S3) 

- 1 

(15) 

We  see  that  equation  (13)  is  simply  minuB  one  times  equation  (14) , 
so  we  will  discard  equation  (13). 

Proceeding  we  obtain: 


P(S1) 

- (p/q)P(S3) 

- 0 

(16) 

P(S2) 

- (p/2q)P(S3) 

- 0 

(17) 

P(S2)  + (1  + (p/q))P(S3) 

- 0 

(18) 

Continuing,  we  obtain: 

P(S1) 

- (p/q)P (S3) 

- 0 

(19) 

P(S2) 

- (p/2q)P(S3) 

- 0 

(20) 

(1  + (p/q) 

+ (p/2q))P(S3) 

- 1 

(21) 

Solving  for  P(S3)  in  (21)  and  doing  appropriate  substitutions  in 
(20)  and  (19),  we  finally  obtain: 

P (S3) 

2q/(2  + p) 

(22) 

P(S2) 

p/(2  + p) 

(23) 

P(S1) 

2p/(2  + p) 

(24) 

It  can  be  seen  that  even  with  a very  simple  example,  a great  deal 
of  effort  was  expended  in  order  to  obtain  a solution  using  this  long 
method. 


III.  THE  SHORTER  METHOD.  Refer  again  to  Figure  1,  the  transition 
matrix  for  this  example.  We  will  work  with  the  matrix  differently  at 
this  time.  First,  we  select  the  moot  complicated  looking  column  of  the 
matrix.  This  is  column  S3,  since  it  contains  an  element  in  each  row. 

We  will  then  proceed  to  solve  for  each  steady  state  probability  in  terms 
of  P (S3), 
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We  thus  obtain: 


P(S1)  - pP(Sl)  + p?(S3)  (25) 

P(S1)  - (p/q)P(S3)  (26) 

P(S2)  - (1/ 2) q P(S1)  + pP(S2)  (27.) 

P(S2)  - (1/2)P (SI)  - (p/2q)P(S3)  (28) 

P ( S 3 ) ■ P(S3)  (29) 


It  is  Interesting  to  note  that  (29)  permits  us  to  disregard  all 
of  the  elements  in  column  S3,  This  is  why  we  selected  the  most 
complicated  looking  column,  because  by  so  doing  we  eliminate  more  work. 

Since  the  sum  of  the  steady  state  probabilities  equals  one, 
and  since 


P(Sj) 


aj  P(S.I) 

l ai  P(Si) 
i-1 


j 

(28), 

write 

■ 1,  2,  3,  (where  aj  represents  the  coefficient 
and  (29))  and  since  P(S3)  cancels  from  each  term 
the  solution  as: 

of  P(S3)  in  (26), 

, we  can  immediately 

P(S1) 

• 2p / (2  + p) 

(30) 

P(S2) 

- p/(2  + p) 

(31) 

P(S3) 

- 2q / (2  + p) 

(32) 

As  is  obvious,  this  is  much  simpler  than  the  other  method. 
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The  exponential  ie  wrong 

But  works  like  a song. 

Beware  the  Weibull: 

It'e  incorrigiible . — Anon. 

All  modele  are  wrong . 

Some  work. — G.  E.  P.  Box 

ABSTRACT.  The  fact  that  failures  follow  the  exponential  distribution 
is  almost  universally  accepted  in  reliability  analysis.  Two  reasons  are 
given  for  this  assumption:  (1)  It  is  commonly  assumed  that  electronic 
components  do  not  wear  out  but  are  subject  to  random  „shocks"  which  may 
cause  failure.  If  these  shocks  form  a Poisson  process  the  underlying 
failure  distribution  to  exponential.  (2)  Sufficiently  complex  equipment 
run  for  a sufficiently  long  time  (failed  components  being  replaced  by 
good  ones)  will  follow  the  exponential  distribution.  These  reasons  are 
investigated,  especially  the  latter  one.  In  many  cases,  equipment  do  not 
last  long  enough  to  reach  the  steady  state  alluded  to  in  (2). 

1.  INTRODUCTION . The  exponential  distribution  is  used,  almost  ex- 
clusively, for  the  time  between  failures  in  reliability  analysis.  Even 
when  it  cannot  be  assumed  that  the  failure  distribution  of  a component 
is  exponential,  the  exponential  distribution  is  used  for  the  time 
between  failures  of  systems.  The  rationale  for  this  is  the  belief  that 
there  is  a theorem  which  states  that  for  large  systems  the  time  between 
failures  ie  exponentially  distributed.  Use  of  tne  exponential  distribution 
simplifies  the  analysis  considerably;  it  is  well  known  that  systems,  whose 
failure  law  follows  the  exponential  distribution,  do  not  age;  the  expon- 
ential failure  law  is  the  only  continuous  distribution  with  this  property. 
Since  the  analysis  using  any  other  failure  law  complicates  the  solution 
considerably,  engineers  are  loth  to  give  up  use  of  the  exponential.  If 
retaining  the  exponential  leads  to  incorrect  conclusions,  one  might  say 
that  the  reliability  engineer  is  „being  seduced  by  an  easy  solution"  or 
is  „cursed  by  the  exponential  distribution".  The  purpose  of  this  paper 
is  to  state,  sojnew.  ut  colloquially  but  a little  more  precisely,  the  theorem 


^Preparation  of  this  paper  was  partially  supported  by  the  Office  of  Naval 
Research  imier  Contract  No.  N00014-77-C-0601/NR042-377. 


underlying  the  correct  use  of  the  exponential  failure  law  for  systems 
whose  components  fail  according  to  another  law,  and  to  show  the  dangers 
when  this  theorem  is  not  used  correctly. 

This  paper  is  concerned  with  the  superimposed  renewal  process, 
illustrated  in  Figure  1 for  the  case  of  n = 5 components  connected  in  ■ 
series.  Wher  any  component  fails,  the  system  fails.  We  assume  that  a 
failed  component;  is  instantly  replaced  by  a new  one.  The  *’s  indicate 
timeo  of-  failure  for  each  component  and  the  bottom  line  indicates  the 
failures  of  the  renewal  process  .or  system.  One  version  of  the  exponential 
limit  theorem  [4]  states  that  if  one  has  a renewal  process  consisting  of 
n components,  with  identical  non-expOnential  failure  laws,  connected  in 
series;  then,  for  r.  greater  than  some  n*  and  t greater  than  some  **,  the 
times  between  failures  of  the  system  are  indeed  exponentially  distributed. 
Intuitively  the  theorem  states  that  for  a sufficiently  complex  system, 
after  some  time  t*  the  components  have  been  replaced  at  „random"  times, 
and  there  is  a random  mix  of  ages  of  components.  Thus  the  succeeding 
times  of  failure  will  occur  at  random — one  of  the  postulates  of  a 
Poisson  process,  which  implies  that  times  between  failures  follow 
the  exponential  law. 

We  have  investigated  how  large  n*  and  t*  must  be  for  the  limit 
theorem  to  yield  a good  approximation  when  the  underlying  component 
failure  law  is  lognormal,  gamma,  or  Weibull.  For  all  those  laws  it 
appears  that  the  dependence  on  n is  not  so  crucial  as  the  dependence 
on  t\  it  is  believed,  however,  that  reliability  engineers  frequently 
ignore  the  dependence  on  t. 

Actually  the  exponential  limit  theorem  is  more  general  than  given 
above.  Under  certain  conditions,  the  components  need  not  all  have  the 
same  failure  distribution:  in  this  case  t*  would  have  to  be  larger  yet, 
and  the  results  given  here  would  be  even  stronger. 

2.  RENEWAL  DENSITY  AND  SYSTEM  HAZARD.  Although  the  mathematical 
details,  which  appear  elsewhere  [1»  2»  3],  will  not  be  repeated  here, 
we  will  give  some  definitions,  outline  the  techniques  used,  and  preeent 
some  cases  to  illustrate  the  results.  Calculations  are  based  on 

hit)  - renewal  density  of  a component 

= fit)  t + 1 m)]*3+ ...  + [/(t)]*rl+ ..., 

where  (/(£)]  denotes  the  n-fold  convolution  of  fit),  i.e.  the  density 
of  the  distribution  of  the  time  to  the  nth  failure  of  the  component, 
treasured  from  the  initial  time;  and  fit ) is  the  failure  density  of  a 
component.  Thus  hit)  is  the  density  of  all  failures  for  a specific  com- 
ponent and  hit)dt  is  tha  probability  that,  in  the  interval  it,  t+dt), 

.the  component  either  fails  for  the  first  time  or  fails  for  the  second 
time  if  it  was  replaced  prior  to  t or  fails  for  the  third  time  if  it 
failed  twice  and  was  replaced  prior  to  t,  etc.  It  can  be  shown  that 
hip)  1/p  as  t ■*  »,  where  u is  the  mean  time  to  failure  of  a component. 
Mote  that  the  renewal  function 
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H(.t)  « Jh(x)dx  a Expected  number  of  failures  up  to  time  t3 
0 

and  that  tf(t)  **  t/y  - constant,  where  the  constant  reflects  the  faot 
that,  for  small  t,  h(t)  is  typically  less  than  1/y. 

Let  h (t)  be  the  system  hazard  so  that  h (t)At  is  the  probability 

that  the  system  fails  in  the  interval  ( t,  t+At) , given  that  it  was  oper- 
ating at  time  t.  For  A t « t the  probability  of  more  than  one  failure  in 
the  interval  is  negligible  and  Mt)  will  be  reasonably  oonstant  in  the 
interval*  These  of  course  are  the  postulates  of  a Poisson  process,  and 
would  suggest  that  some  exponential  limit  will  apply.  Zn  addition,  h(.t) 
is  an  ensemble  average  over  many  components  with  different  replacement 
histories.  It  is  not  an  appropriate  failure  rate  to  use  at  time  t for 
a component  last  replaced  at  some  known  time;  in  a system  of  n components 
connected  in  series,  however,  the  summation  of  failures  over  components 
is  a good  approximation  to  this  ensemble  average  for  large  n.  Since  we 
have  assumed  that  failed  components  are  instantly  replaced,  the  mean 
number  of  system  failures  in  the  interval  is  rigorously  n times  the 
mean  number  of  failures  of  a component.  Thus  rihlt ) is  rigorously  the 
system  failure  rate  at  time  t,  computed  when  the  system  is  first  put 
on  test  (t  = 0).  Because  of  the  averaging  over  n components,  rih(t)  is 
a good  approximation  to  the  system  failure  rate  at  time  t3  computed  at 
time  t,  after  we  know  the  system  history;  and  hg[t)Lt  can  also  be  con- 
sidered as  the  mean  number  of  system  failures  in  (t,  t+At). 

We  are  interested  in 

4 * J(U;t,n)  * Pr{next  failure  occurs  after  t+u>  | present  age  is  t). 

But , for  large  n3  > - X,  where 

X * X(,w;t,n)  “ Pr{next  failure  occurs  after  t+U  | failure  occurred  at  t). 

Note  that  in  J>  we  have  n components  with  unknown  agos  (although  we  do 
know  the  distribution  of  those  ages),  while  in  x w®  have  n- 1 components 
with  unknown  ages  and  1 component  which  is  new  at  time  t.  Thus  the  first 
moment  of  X is  the  mean  time  between  failures.  Define  a,  a dimensionless 
waiting  time,  by  w * ys/n,  where  y is  the  average  time  to  failure  of  a 
component  and  y/n  is  the  average  waiting  time  between  system  failures ^ 
Then  it  has  been  shown  (3]  that,  neglecting  terms  of  the  ordsr  of  n , 

X(ys/n;t,n)  » exp{-yeft(t)}{l  - (ye)2^/?**- (ya)3^24*2}*  Cl) 

where 

Ll  * ^l^8'  ^C*)»  **'(*))  and  ^2  * £,2^8'  *'(*)»  h"(t)}; 

i.e.  the  ..correction"  terms  depend  on  ys  and  the  renewal  density  and  its 

derivatives.  This  dependence  is  reasonable.  For  large  w (earlier  in  this 

section,  when  relating  the  system  hazard  to  the  Poisson  process,  w was 

denoted  At)  the  system  hazard  fc  (t+ 9u>),  0 < 0 < 1,  is  not  a constant;  so 

0 

that  h (t)  / h (t+u).  The  mean  number  of  failures  in  time  w is  given  by 
8 8 
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Using  a Taylor  expansion  around  t for  the  integrand  will  involve  the 
derivatives  of  h. 

Now,  for  n infinitef  (1)  becomes 

lim  1 (ps/n;t,rt)  = exp{-ua/i(t)} , (2) 

<ind  the  waiting  time  is  characterized  by  a non-homogeneous  Poisson 
process.  If,  furthermore,  «■*■“,  then  h(t ) 1/p  and  we  have 

lim  lim  £ (ve/n\t,n)  = e“Bt  (3) 

f-W 3 fl-KO 

the  limit  theorem  referred  to  in  Section  1. 

We  shall  presort  results  based  on  (1)  and  (2)  when  the  underlying 
failure  distribution  is  gamma  or  Weibull.  For  the  gamma  we  have 


i /(*)  = xa~  1exp(-x/6)/{60lr(a)},  X > 0,  0 > 0,  a > Oj  (4) 

P = 001  i (5) 

for  the  Weibull, 

/<*)  = pffi~1(a?/6)Pexpi-<*/0)^},  * > 0,  0 > 0,  p > 0;  (6) 

p = era+p"1).  (7) 


3.  EXAMPLES.  liw\t,n)  is  plottod  as  a function  of  t in  Figures 
2-9  for  gamma  and  Weibull  components.  The  smooth  curve  represents  n - 
+ represents n = 64  and  * represents  n = 256.  Figures  2,  4,  5 appeared  in 
[1] ; Figures  3,  6,  7,  in  (3J ; Figures  8,  9 were  used  in  the  oral  presen- 
tation of  [5]  but  did  not  appear  in  the  Proceedings  and  have  not  been 
published  previously. 

In  interpreting  the  gamma  plots,  Figures  2-7,  several  successive 
transformations  from  real  time  to  coded  time  must  be  made.  Start  with  T, 
the  age  of  the  system,  and  W,  the  waiting  tima,  both  in  clock  hours;  so 
that  we  are  concerned  with  failures  in  the  interval  (T,T+W).  Then  trans- 
form: (a)  Eliminate  0 by  computing  t « 3Y0  and  W - W/9.  (b)  The  non- 
dimensional  waiting  time 

e = nW/u  - nW/(9a)  = m/a. 


(o)  The  curves  are  indexed  by  e“8,  the  double  limit  for  n and  T infinite, 
which  is  given  equally  spaced  values  from  .05  to  .95;  thus 


(d)  Instead  of  t. 


W - -putlogs”8. 
t/a  «•  T/\  p 


was  used  in  order  to  relate  the  plots  to  systems  composed  of  elements 
having  unit  mean  life  regardless  of  a.  To  have  used  t would  involve 
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making  plots  for  systems  whose  elements  had  different  mean  lives  for  dif- 
ferent values  of  a and  would  make  comparison  of  the  results  for  different 
a more  difficult,  since  both  shape  and  mean  life  would  be  changing.  (0) 
Finally,  exp(-t/a),  rather  than  t/a,  was  taken  as  the  argument,  to  „com- 
press"  the  abscissa  in  the  curves.  This  final  normalization  means  that 
the  gamma  plots  must  be  read  from  right  to  left:  t * 0 and  » correspond 
to  abscissas  of  1 and  0 respectively,  (The  Weibull  plots,  Figures  9 and 
9,  read  from  left  to  right.) 

The  asymptotic  probability  a”8  ranges  from  0.05  to  0.95  by  steps 
of  0.10  in  Figures  2,  4,  5 and  by  staps  of  0.30  in  Figures  3,  6,  7,  8,  9. 
Thus  the  top  cui. ve  in  Figure  2 corresponds  to  « « log(.95)  “ .05;  a * 12, 
w * ae/n  * .6 /n.  Because  w depends  on  both  o and  8,  each  curve  on  any 
figure  represents  a different  u;  the  same  u,  moreover,  corresponds  to 
different  V as  9 is  varied. 

To  illustrate  these  somewhat  confusing  transformations  that  take  V 
into  8,  consider  a system  with  n = 300  components,  a * 2,  and  9 * 5000 
hours,  so  that  |i  = 10,000  hours;  and  let  the  contemplated  waiting  tima 
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FIGURE  6.  X (a.8/n\  t,  n)  for  n gamma  components:  a = 2 


u * 100  hours.  Then 

8 s nW/\i  = 300x100/10,000  = 3,  e~B  - 0.05, 

Thus  the  time-equilibrium  probability  (t  ~ *)  that  the  system  operates 
for  at  least  100  hours  without  a failure  is  0.05. 


As  another  example,  suppose  we  desire  to  find  the  probability  that 
a system  of  100  components  survives  without  failure  for  at  least  24  hours 
when  all  of  the  components  have  the  gamma  distribution  with  a * 2 and 
mean  life  10,000  hours  (0  * 5,000  hours).  The  system  age  is  T - 10,000 
hours.  We  have  t * 2,  t/a  = 1,  a ~ 100X24/10,000  «=  0.24;  so  that 

e~°  = 0.787,  = 0.368. 

Interpolating  in  Figure  5,  we  find  X s 0.792.  Alternatively  one  could 
show  that  uMt)  s 0.984  and  use  (2)  to  obtain 


X 


-.24*. 984  -.236 

e - e 


0.790. 
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This  is  a somewhat  larger  survival  probability  than  the  time-equilibrium 
prediction  would  give.  The  difference  is  more  striking  if  we  consider 
the  probability  of  surviving  240  hours  so  that  a = 2.4; 

e~8  = 0.091  and  X = e-veh{t)  _ g-1.4  _ 0^247j 

which  is  considerably  larger  than  the  equilibrium  value,  0.091.  The 
errors  in  ignoring  system  age  are  seen  to  be  far  greater  for  large 
waiting  times  than  for  small  ones. 

Several  global  conclusions  can  be  drawn  from  these  curves.  The 
most  important  is  that  the  effects  of  finite  t are  more  important  than 
the  effects  of  finite  n.  This  may  be  seen  from  the  wide  fluctuations  of 
X as  t varies  and  the  closeness*  of  x's  and  +'s  to  the  smooth  curve  for 
t = ».  The  approach  of  X to  its  limiting  value  for  a * i,  as  displayed 
in  Figure  7,  is  monotonic  increasing;  this  is  because  gamma  components 
have  decreasing  hazard  rates  when  a < 1.  Although  we  do  not  present  the 
curve  here,  the  same  phenomenon  has  been  Been  for  Weibull  components 
with  p < 1.  As  a ( or  p ) gets  larger  there  is  a range  of  shape  parameter 
for  which  the  approach  is  monotonic  decreasing,  as  shown  in  Figures  6,  7, 

9.  For  still  larger  a or  p the  curve  oscillates  before  damping  in  its 
approach  to  the  equilibrium  value;  the  larger  a,  the  more  oscillations 
are  visible. 

These  oscillations  were  not  expected,  but  they  are  genuine.  Since 
hindsight  is  often  20/20,  we  now  give  an  intuitive  justification  for  the 
phenomenon.  If  the  mean  of  the  failure  distribution  of  a component  is 
large  relative  to  its  standard  deviation  (if  the  component  has  a small 
coefficient  of  variation)  failures  concentrated  near  the  component  mean 
life  u reduce  the  reliability,  causing  a relative  minimum.  After  replacing 
the  failed  components,  the  reliability  is  increased,  causing  a maximum. 

But  after  an  additional  time  y the  second  generation  of  components  will 
fail,  causing  a second  maximum,  etc.  Thus  we  expeot  peaks  to  occur  at 
values  of  T that  are  multiples  of  y.  The  peaks  get  wider  and  shallower 
as  T increases,  until  failures  are  essentially  Mrandom"  and  the  exponential 
limit  takes  effect.  This  situation  is  illustrated  in  Figure  10.  The  upper 
set  of  curves  represents  f(.t)  and  its  convolutions  (time  to  second 
failure,  time  to  third  failure,  etc.).  The  distribution  of  kth  failures 
peaks  at  t = ky;  its  standard  deviation  is  of  the  order  of  y/k  times  the 
coefficient  of  variation  of  f.  Thus  the  peaks  do  get  wider  and  shallower 
as  T increases.  Another  heuristic  argument  is  illustrated  by  the  lower 
curve  in  Figure  10,  representing  h(t)t  the  sum  of  the  curves  in  the 


*A  comparison  of  the  two  curves  for  o = 2,  Figures  2 and  3,  indicates 
that  the  approach  for  n -*■  » is  faster  in  Figure  3 than  in  Figure  2.  Both 
curves  represent  computer  plots.  We  had  intended  to  include  only  Figure 
3,  but,  having  discovered  the  discrepancy,  found  it  advisable  to  include 
both.  Clearly  one  of  the  computer  programs  used  was  in  error.  The  program 
is  being  rewritten;  a correct  tabulation  and  plot  will  be  furnished  on 
request. 
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FIGURE  10.  Schematic  representation  of  f it)  (above)  and  hit)  (below) 

upper  figure:  it  oscillates  and  then  stabilizes  to  a constant  value.  But 
one  observes  from  (1)  that  X is  essentially  a monotonic  function  of  hit) 

(L i and  Lj  affect  the  size  of  the  oscillation,  but  have  little  effect  on 
its  location)  and  one  observes  from  (2)  that  the  asymptotic  £ for  n « 
ia  a monotonic  function  of  hit),  with  sense  reversed:  the  peaks  of  hit) 
are  mirrored  into  the  troughs  of  X it).  It  is  well  known  that  the  coef- 
ficient of  variation  of  the  gamma  and  Weibull  distributions  decreases  as 
a and  p,  respectively,  increase. 

The  oscillations  increase  the  value  of  T/\i  needed  before  one  can 

be  sure  that  the  deviation  of  X from  its  limit  is  less  than  some  specified 

value.  For  example,  consider  the  curve  of  , 

g-uehit)  fop  e-a  E 035 

when  fit)  is  a gamma  density.  Table  1 is  obtained  by  finding  on  these 
curves  the  time  beyond  which  the  value  of  l never  deviates  from  0.35  by 
more  than  1%  (i.e.  0.0035).  Mote  that  such  a time  as  T = 3.1y  can  be 
very  large  for  highly  reliable  components.  For  example,  if  a = 12,  and 
0=1  month,  and  n = 256,  then  on  the  average  the  system  has  256  failures 
per  year  or  ope  failure  every  1.4  days.  Yet  the  steady-state  exponential 
limit  is  reached  after  3.1  years  I If  a = 12,  and  6 = 1 year,  and  n - 256, 
then  the  system  fails  every  17  days ',  and  the  steady  state  is  reached 
after  37  years!  Do  many  systems  last  this  long?  if  not,  one  should  not 
be  analyzing  their  reliability  by  means  of  the  exponential  assumption. 

Table  2 illustrates  how  the  mean  life  p = a9  (for  gamma  components) 
enters  the  calculations.  The  first  two  lines  were  read  from  Figure  2.  If 
6 = 15  hours  and  n = 256,  the  MTBF  of  a component  is  130  hours  and  there 
is  a system  failure  every  42  minutes.  If  6 = 15  years  and  n - 256,  the 
MTBF  of  the  system  is  257  days;  the  last  line  of  Table  2 indicates  that 
steady  state  has  not  arrived  after  165  years. 
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TABLE  1.  Time  for  oscillations  to  die  down  as  function  of  scale  parameter 


scale  parameter 

normalized 

time 

coded  time 

a 

t/a 

e't/ a 

t * T/Q 

V* 

3.0 

.050 

1.5 

S/2 

1.2 

.301 

1.8 

2 

1.2 

.301 

2.3 

6 

1.7 

.183 

10.3 

12 

3.1 

.045 

37.6 

TABLE  <2.  Effect  of  scale  parameter  0 on  reliability  calculations: 

Poisson  components,  a - 12 


0 

1 

.23 

.40 

.58 

Z .75 

.795 

.663 

.90 

t/a 

1.47 

.92 

.54 

0 = 

3 

o 

T = 

17.6  mos. 

11.0  mos. 

6.5  mos. 

15  hrs. 

265  hrs. 

165  hrs. 

9B  hrs. 

15  yrs. 

265  yrs. 

165  yrs. 

98  yrs. 
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APPLICATION  OF  TIME  SERIES  MODELS 

Georv  E.  P.  Box 
University  of  Wisconsin 
Madison,  Wisconsin 

1.  The  need  for  time  series  models 

Statistical  models  with  which  the  user  is  perhaps  most  fam- 
iliar are  of  a form  such  that  for  the  t'th  of  n observations 

yt  - f(xt,  B)  + ut  (1) 

where  yt  is  the  t'th  observed  value,  is  a vector  of  k independ- 
ent (input)  variables,  4 B is  a vector  of  p parameters.  The  error 

■*»  * 

term  u^  has  zero  mean  & is  often  assumed  to  be  distributed 

i)  normally, 

p 

ii)  with  constant  variance  a independent  of  t, 

ill)  independently  of  any  other  error  u (s/t). 

s 

Such  models  include  those  customarily  associated  with  anal- 
ysis of  variance  as  well  as  with  regression  analysis.  The  practi- 
tioner is  however  frequently  involved  with  data  which  occurs  ser- 
ially in  time  or  space.  Thus  y^,  y2J»->»yn  might  be  successive  ob- 
servations of  the  positions  of  a missile  observed  every  second,  or 
of  recruitment  to  the  Army  observed  every  month.  For  such  data  the 
errors  are  unlikely  to  be  independent.  A disturbance  occurring 
at  time  t ir  likely  to  influence  not  only  an  observation  made  at 
time  t but  also  subsequent  observations  at  times  t+1,  t+2 , etc. 

In  such  a case  the  errors  u^.  may  be  serially  correlated. 

Now  statisticians  have  a great  deal  of  experience  with  build- 
ing models  of  the  form  of  (1)  and  have  available  a battery  of  tech- 
niques which  are  appropriate  when  the  assumptions  mentioned  above, 

(in  particular  the  independence  assumption)  are  true.  Most  notably 
maximum  likelihood  estimates  of  the  parameters  B may  then  be  obtained 

by  use  of  the  method  of  least  squares  (i.e.  standard  regression  analysis) 
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It  might  therefore  be  asked  whether  serial  correlation  of 
errors  will  seriously  invalidate  these  standard  methods.  Statis- 
ticians have  traditionally  seemed  to  worry  most  about  the  effects 
on  non-normality  rather  than  the  effects  of  stochastic  dependence 
of  errors.  It  is  therefore  relevant  to  consider  how  badly  the 
effects  from  violation  of  serial  independence  assumptions  compare 
with  those  from  non-normality. 

Table  1 shows  the  result  of  sampling  experiment  (Box  1976, 

[1])  in  which  two  samples  of  10  observations  from  identical  popula- 
tions of  the  forms  indicated  were  taken  and  subjected  to  a t-test  (t) 
and  a Mann-Whitney  test  (MW).  The  sampling  was  repeated  1,000  times 
and  the  number  of  results  significant  at  the  5 percent  point  was 
recorded.  Ideally,  this  number  should  be  50  ( that  1b,  5 percent 
of  the  total)  but  it  has  a standard  deviation  of  about  7 because  of 
sampling  errors.  More  accurate  results  may  be  obtained  by  taking 
larger  samples  or  by  analytical  procedures,  however,  since  there  is 
no  practical  difference  between  a significance  level  of  say  4 per- 
cent and  7 percent,  the  present  Investigation  suffices  for  illustra- 
tion. Autocorrelation  between  adjacent  values  was  introduced  by 
generating  observations  so  that  p^,  the  first  serial  correlation,  had 
values  of  -0.4  and  +0.4. 

The  frequences  on  the  left  are  those  obtained  for  a nonrandom- 
ized  test.  The  frequencies  on  the  right  are  obtained  when  the  ob- 
servations were  randomly  allocated  to  the  two  groups. 

As  is  to  be  expected  the  significance  level  of  the  t-test  is 
affected  remarkably  little  by  the  drastic  changes  made  in  tho  mar- 
ginal parent  distribution — changes  for  which  the  "distribution-free" 
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test  provides  insurance.  Unfortunately,  of  course,  both  tests  are 
equally  impaired  by  error  dependence  unless  randomization  is  intro- 
duced when  they  do  about  equally  well.  The  point  is,  of  course, 
that  it  is  the  act  of  randomization  that  is  of  major  importance  \ 

here  not  the  introduction  of  the  non-parametric  test  function. 

In  the  situations  we  discuss  in  this  paper,  randomization  is  j 

not  possible  and  It  is  evident  that  in  this  case  we  face  a serious  j 

i 

problem  if  errors  are  serially  correlated.  ! 

i 

As  a further  illustration  consider  the  following  regression  j 

model  used  by  Coen,  Gomme  & Kendall  (1969),  [16]  to  model  quarterly  i 

data  in  which  yfc  is  a stock  market  index,  x^  t_g  is  a measure  of  U.  i 

K.  car  production  lagged  6 periods  & x2  is  a commodity  index  j 

* 

lagged  7 periods  j 

yt  ■ a + 6^  + t_g  + BgXj  + ut  I 

■ On  the  assumption  of  error  independence,  for  which  ordinary 

i 

least  squares  is  appropriate,  estimates  of  B^  and  3^  were  calcula-  ] 

ted.  These  were  14.1  and  -9. 9 times  their  standard  errorst  indica- 
ting overwhelming  significance.  On  this  basis  the  authors  of  the 


paper  believed  that  they  could  forecast  future  stock  market  prices. 

It  was  subsequently  pointed  out  however  (Box  & Newbold  1971,  Cio]), 
that,  as  soon  as  proper  provision  was  made  for  serial  dependence  in 
the  errors,  the  apparent  relationships  disappeared. 

2.  ARIMA  time  series  models 

f'odels  having  their  origins  in  the  work  of  Yu3e  [22j  Slutsky 
[19]  & Yaglom  [PI],  vrhich  have  been  found  to  be  of  great  practical 
value  in  representing  serial  dependence,  employ  stochastic  difference 
equations  of  the  form 

♦CB)ut  - 0 (B)at  (2) 
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where  Is  the  sequence  to  be  modelled,  B is  the  backshift  oper- 

ator such  that  Bu^  ■ 

♦(B) , called  the  autoregressive  operator  is  such  that 

♦ (B)  - l-^E-^B2,..  .-♦pBP 

0(B),  called  the  moving  average  operator,  is  such  that 

0(B)  - l-G.B-e.B2-. . .-0  Bq 

and  {at)is  "white  noise",  that  is,  a source  of  independent  random 

"shocks"  roughly  normally  distributed  about  zero  with  constant  vari- 
2 

ance  a . 
a 

The  form  of  equation  needed  is  often  rather  simple.  Thus  Pig. 

1 shows  a number  of  real  time  series  together  with  the  fitted  sto- 
chastic models  which  have  been  found  to  represent  them. 

As  illustrated  by  these  examples,  models  for  nonstationary 
time  series  may  often  be  built  by  fitting  a stationary  model  to  a 
differenced  series.  Thus  (see  Figure  1(b))  the  first  difference 
u^-ut  1 » (1-B)ut  « Vufc  of  the  stock  price  series  is  represented 
by  a stationary  first  order  moving  average  model  yielding  the  over- 
all model 

(1-B)ut  * (l-0B)at  with  0 - -.1 

1-e‘  ut  " Vl  + at  + -lat-l 

Models  of  this  kind  have  been  used  successfully  to  solve  a wide 
variety  of  problems  including 

Forecasting  future  values  of  a series. 

Smoothing  Series  (including  seasonal  adjustment  of  series). 
Intervention  Analysis  (detecting  & estimating  effects  of 
system  changes  buried  in  dependent  noise). 

Control  of  Systems. 

V/e  now  illustrate  some  of  these  applications  with  examples. 
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Pig.  1 (l-.9b)ut  - l.«J5+(l-.6B)a 


Chemical  Process 


1-30 


(c)  Scries  arising  in  a control  pr<blem  with 


(d)  Seasonal  series:  losrs  of  monthly  passengers 
totals  in  international  air  travel,  forecasts 
for  ud  to  36  months  ahead  all  made  from  areitrary 
origin  July  1957. 
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3 • Estimating  future  location  of  a mis slle 


The  following  is  taken  from  an  MRC  technical  report  (Box  & 
Pallesen  1978.  til])  which  describes  the  modelling  of  some  missile  data 

(made  available  by  Mr.  Paul  Thrasher  of  the  Quality  Evaluation  Division 
of  White  Sands , Missile  Range).  It  shows  how  a stochastic  difference 
equation  model  may  be  built  & used  to  predict  the  future  location 
of  the  missile*  Details  of  the  calculations  will  be  found  in  the 
book  by  Box  !■  Jenkins  indicated  here  by  13 ft J [ 8 3 . 


A model 


p z q z 


is  said  to  be  an  ARIMA  (Autoregressive  Integrated  Moving  Average) 
model  of  order  (p,d,q)  if  $p(B)  4 6 (B)  are  Polynomials  in  B of 
degrees  p 4 q respectively  having  zeros  outside  the  unit  circle. 

3 . 1 Identification.  Fitting  and  Checking  of  Model 

The  data  series  we  are  considering  consists  of  2h6  consecutive 
observations  of  the  x-coordinate  of  a missile  trajectory.  The  ob- 
servations, z^j  t ■ 1,2,..., 2^6,  were  made  with  constant  sampling 
interval  and  there  are  no  missing  or  obviously  aberrant  values. 

Modeling  such  a time  series  Is  conceived  of  as  an  iterative 
process  involving  three  stages:  Identification,  fitting  and  diag- 
nostic ohecklng.  Identification  is  first  performed  along  the  lines 
of  Chapter  6 in  D4J  [8],  Plotting  the  data  (Figure  2a)  shows  a 
smooth  nonstationary  series,  whose  autocorrelation  function  (Figure 
2(b))  dies  out  extremely  slowly.  After  differencing  three  times 
the  series  y^zfc  appears  stationary  and  its  sample  autocorrelation 
and  partial  autocorrelation  function  (Figures  2c  and  2d)  suggest 
that  a reasonable  model  for  should  Include  a few  moving  aver- 

age parameters  of  low  order.  A clear  identification  is  not  possible 


Figure  2b.  Sample  autocorrelation  function  of  original  serlen 
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at  this  point  but  a stochastic  difference  equation  model  of  the  form 

V3zt  - (1  - fljb  - 02B2  - 63B3)at  (4) 

is  considered  worthy  of  being  tentatively  entertained. 

Fitting  this  model  by  the  method  of  Chapter  7 in  B&J  [8]  gives  j 

the  parameter  estimates,  residual  sum  of  squares  (RSS)  and  the  j 

residual  mean  square  (RMS)  listed  in  Table  2.  If  this  model  is  j 

adequate  the  RMS  value  provides  an  estimate  of  the  variance,  j 

2 2 

o ■ E(a. ),  which  is  the  one  step  ahead  forecast  error  variance, 
a u i 

Diagnostic  checking  (Chapter  8 in  BftJ  [8])  involves  examination  ' 

I 

of  the  residuals  (the  estimated  a. 's)  left  after  fitting  this  model 

t i 

to  seek  for  departures  from  the  "white  noise"  form.  One  way  of 

doing  this  is  to  submit  the  residual  (a.) sequence  to  the  identifica- 

w { 

tion  procedure  previously  applied  to  ^z.. . In  fact  the  autocorrela- 

v J 

tion  function  of  the  residuals  at*s,  Figure  3(a),  suggests  that  while 
most  of  the  dependence  is  being  accounted  for  by  the  model,  some  sig- 
nificant low  order  autocorrelations  remain,  indicating  some  addi- 
tional  0 parameters  are  needed.  Notice,  that  the  diagnostic  check- 
ing of  the  model  (4)  reveals  model  inadequacy  and  also  identifies 
in  which  way  the  model  should  be  modified. 

After  another  cycle  the  model 

v3zt  - (i  - e^B  - e2B2  - e3B3  - e^n1*  - e5B5)at  (5)  ' 

i 

is  identified,  and  it  fits  the  data  very  well,  leaving  residuals,  J 

Figure  3(b),  which  look  like  white  noise.  Figure  3(c)  shows  the 
sample  autocorrelations  of  the  residuals.  This  fitted  model  along 
with  some  other  contenders  are  listed  in  Table  2.  Additional 
models  are  fitted  as  a check  that  additional  parameters  would  not 

I 
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Table  2 

1 

; 

i 

1 

I ; 

Models  fitted  to  Missile  data 

■ 

1 

1 

| 

1 

i 

1 

(x-coordinate) 

(P/d,q) 

Model  I RSS 

RMS  (DD  ' 

(0,2,2)  V"  zt  = (1  - OjB  - 02B2)«t  410. 


/Moduli  of  roots;  1.  39;  1.  39\ 

\ ,1.6.  stable  J 

(0,2,3)  V2zt=  (l-01B-02B2-03B3)at  348. 


/Moduli  of  roots:  1.14;  1.  14;  1. 83' 
V i.  e.  stable 


Table  2 Continued 


i ! 


(p,d,q) 

Model 

RSS 

RMS 

(DF) 

(0, 3,  n 

V3z  = (l-0,B-0,B2-0,B3-O.B4)af 

i i c 3 4 i 

203. 

. 85 

(239) 

. 

1 

0J  = 1.938  y 

1.99 

1 89 

1 

| : 

02  = -1.030  | 

r 

- .95 

-1.  1 1 
w 

1 

J 

i ; 

03  = - . 146  < 

r-  .02 

- .27 

1 

! 

1 

©> 

11 

OJ 

f .25 
.09 

(Moduli  of  roots 

1. 13;  1. 13;  1.  59; 

2.  86) 

| 

; 

i (0,  3,5) 

V 3z  = (1-O.B-O1B2-O,B3-OIB4-0.B5) 
t 12  3 4 5 

192. 

. 81 

i 

(238)  | 

0j  = 2.078  1 

2,11 
^ 2.04 

1 

1 

1 

« 

1 

l 

i 

S2=  -1.291 

r -1.21 

-1.  37 

! 

i 

; 

®3  = ’ -115 

f .oo 

1 - .23 

i 

' 

04  = .395 

| . 51 

1 . 28 

i 

K =“  *131 

f - .04 

i - .22 

i 

t 

i 

(Moduli  of  roots: 

1. 11 ; 1 . 11  ;1 . 79;1.  79 

1.  93) 

i 

\ 

1 

I 

j 

t' 

f- 

*: 


I 

l 


I 

I 

I 
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substantially  Improve  matters  (overfitting),  and  also  to  demon- 
strate that  the  chosen  number  of  differencings  Is  appropriate. 

3 . 2 Checking  zeros  of  6(B) 

Regarding  the  operator 

0(B)  - 1 -GjB  - 62B2  - 03B3  - (6) 

as  a polynomial  in  B,  it  is  Bhown  in  B&J  [8]  that  a necessary  require- 
ment for  a sensible  model  is  that  the  zeroes  of  this  polynomial  be 
outside  the  unit  circle  ( lnvertibility  property). 

It  is  important  to  check  this  and  the  moduli  of  the  roots  given 
In  Table  2 indicate  that  the  model  is  indeed  invertible. 

3. 3.  Forecasts 

Accepting  that  the  (0,  3,  5)  model  provides  an  adequate  re- 
presentation of  the  system  (with  the  (0,  3*  *0  model  as  a close 
runner-up)  the  forecasts  produced  are  most  easily  calculated  from 
the  difference  equation  itself  (see  Chapter  5 of  B&J  [G]).  From 
Equation  (5)  we  find 

*t  ■ 3Vl  - 3*t-2  * *t-3 

+ a t ~ ®2at-2  ” ®3at-3  " enat-4  ~ ®5at-5'  ^7) 

Then  by  taking  conditional  expectations  of  zt+l,zt+2*  ‘ * ’ ,zt+l.  at 
origin  t (as  described  in  B1J  p.  130  [8])  the  1,2,3,. .. , . step 
ahead  forecasts  are: 

zt(l)  - 3zt  - 3zt-1  + zt-2  “ ttlat  " e2at-l  " e3at-2  ” e^at-3  " 05at--4 

zt(2)  - 3zt(l)  -3zt  + zt_1  “ 02at  “ e3at-l  " 04at-2  “ e5at-3 

V3>  • 3zt(2)  - 32t(1)  + zt  - e3at  “ enat-l  - *5at-2 

tt(4)  - 3*t(3>  -3*t<2)  ♦ SCI)  - 6J<at  - 05at_1 

8t(5>  - 3zt(1»)  * 3zt(3)  - a(2)  - e5at  <8) 

*tU)  - 3et(i-l)  - 3zU-2>  -zU-3)  t > 6 


I 

I 

I 

I 

l 


i 

| 

j 
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Table  3 

FORECASTS 


Obs  H 

l 

Actual  value 

Model 
(0,3,  5) 

Model 

(0,3,4) 

201 

13225.  08 

1 3224. 78 

13224.80 

202 

1 3306.74 

1 3305.  80 

13  305.  94 

203 

1 3387. 51 

1 3386. 70 

13386. 77 

201 

1 3468.42 

1 3467.  20 

134  67.2  3 

205 

13549. 74 

1 3547.  33 

1 3547.  34 

206 

13628. 61 

1 3627.  10 

13627. 08 

207 

13708. 78 

1 3706.49 

13706.46 

208 

13788. 67 

1 3785.  52 

1 3785.48 

209 

13868. 21 

13864. 18 

13864. 1 4 

210 

13947. 30 

1 3942.47 

13942.44 

Model 
(0.  5,  .51 


In  practice  of  course  this  Is  done  automatically  by  the  computer. 

For  illustration,  the  forecasts  produced  by  this  model  with  an 
origin  (for  all  forecasts)  at  t ■ 200  is  shown  in  Figure  It.  It 
will  be  noticed  that  the  forecasts  are  in  very  close  agreement 
with  the  actual  values.  Even  the  10-step  ahead  forecast  is  hardly 
distinguishable  from  the  actually  observed  value. 

Table  3 lists  the  actual  values  and  the  forecasts  numerically. 
The  forecasts  produced  by  the  models  (0,  3,  *0  and  (0,  3,  3)  are 
also  very  good  and  they  are  included  for  comparison. 

3. 4.  Error  of  Forecasts 

In  order  to  determine  the  error  of  the  forecasts,  it  is  helpful 
to  write  the  model  (3)  in  random  shock  form.  Thus  formally 

e_(B) 

«t  * T3 »t  “ ♦(B)at  t9) 

» *p(B) 

where 

<J<(B)  " 1 + +...  (10) 


And  it  is  shown  in  B&J  [8]  p.  126-128  that  the  lead  t forecast  error 
is 


etU) 


Jt+t 


- y*) 


at+l + 


’Vt+i-l 


+ . , . + , a 


A-l  t+1 


(11) 


Whence  the  variance  of  the  forecast  error  is 


var[et(D]  « E(zt+Jl  - 2t(Jt))2 


i.2  . .,.2  . , .2  \_2  . 


■ u + *1  + ♦;  ♦ •••  + 

For  the  fitted  model  (5)  the  ^-weights  are  calculated  by  equating 
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(12) 


coefficients  in  (13),  B&J  [8]  pp.  132-134. 

(1  - 3B  + 3B2  - B3)(l  + YjB  + i^B2  + . ..) 

- (i  -BjB  - e2B2  - e3B3  - 64b4  - e5&5)  (13) 


Specifically  we  find  the  ip,  values  Riven  in  Table  4.  Using  the 

a2  J 

estimated  a*  • .81  from  Table  2,  the  variance  of  the  forecast  error 
is  given  for  l - 1,2,..., 10.  The  last  column  in  Table  4 listB  ±2 
standard  errors,  corresponding  to  approximately  95%  probability 
intervals  for  the  forecasts.  We  note,  that  these  probability  inter- 
vals are  so  narrow,  that  they  cannot  be  distinguished  from  the  fore- 
casts themselves  in  a plot  like  Figure  4. 

The  above  is  all  that  is  needed  to  compute  forecasts  and  the 
standard  deviations  of  foreoast  errors.  What  appears  in  the  follow- 
ing sections  is  not  necessary  for  calculation,  but  does  illuminate 
the  nature  of  the  projection  process. 

1 

3. 5 Integral  forms 

As  discussed  in  Chapter  4 pp.  103-114  of  B&J  [8],  the 
equivalent  integrated  form  of  the  model  of  Equation  (5),  is  of 
some  interest  also.  In  this  form  the  observations  appear  as  a 
linear  aggregates  of  past  random  shocks,  their  difference,  sum, 
sum  of  sums,  eto.,  plus  a new  random  shock.  Specifically  the  in- 
tegrated model  form 

2 3 

■t  ■ *„2Vat-l  + *-lat-l  + *0Sat-l  + *is  at-l  + *2S  at~l  + at 


degenerates  to  different  models  from  Table  2 when  certain  of  the 
A-eoefficients  are  taken  to  be  zero.  Table  5 links  models  from 
Table  2 to  their  equivalent  Integrated  forms,  and  lists  estimated  A 
coefficients  which  can  be  calculated  from  the  estimated  6's.  Con- 
version formulas  for  the  models  under  consideration  are  given  in 
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Die  4 

nd  forecast  errors 

ar|et(f  1] 

Approx,  of'"'.! 
Probability  Interval  ^ 

. 81 

l.  S 

1.  38 

± 

2.  3 

1.  81 

± 

2.  7 

3.74 

± 

3.  9 

5.98 

± 

4.  9 

9.  15 

± 

6.  0 

13.  6 2 

i 

7.4 

19.  70 

± 

8.  9 

27.  77 

10.  5 

38,  20 

± 

12.4 

Table  b,  but  can  more  generally  be  found  from  equating  coefficients 
In  Equation  4.3*21,  p.112  in  B&J  [8]. 

3.6.  The  eventual  forecast  function 

One  question  of  interest  is  what  function  is  being  selected  for 
projecting  the  forecasts,  i.e.  what  is  the  forecast  function.  It 
is  shown  in  B&J  [8]  p.  139  that  depending  on  the  nature  of  the  left 
hand  operator,  the  model  (3^  could  call  for  forecasts  lying  on  an 
updating  function  that  could  consist  of  any  combination  of  poly- 
nomials, exponentials  and  sine  and  cosine  waves.  What  forecast  func- 
tion does  the  model  imply  for  the  present  fitted  (0,  3,  5)  model? 

The  eventual  forecast  function  for  the  (0,  3,  5)  model  satis- 
fies the  difference  equation 

^zt(£)  - 0 (15) 

us  1— n 

which  has  as  its  solutions  a nolynomial  in  l of  2 degree 

ztU)  - b*t5  + (16) 

and  applies  for  l>q  - p - d (i.e.  i>2). 

In  other  words  the  model  (0,  3»  5)  implies,  that  the  forecasted 
future  values  from  some  time  origin  t,will,  except  for  slight  devi- 
ations in  the  first  two  lead-times,  follow  a quadratic  curve.  (The 
(0,  3»  4)  model  which  fits  slightly  less  well  implies  that  only  one 
initial  deviation  occurs,  while  the  (0,  3»  3)  model  implies  that  all 
forecasts  lie  on  a quadratic  curve). 

Although  the  forecasts  are  best  calculated  directly  from  the 
difference  equation  as  above  it  is  enlightening  to  further  consider 
their  nature. 

As  the  origin  of  forecasts  is  advanced  the  calculating  process 
requires  that  coefficients  bg,  b^  and  b2  are  sequentially  updated. 
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Table  6 


Conversion  formulaf},  0 to  X. 


Model 


Formulae 


(0,2,2) 

X0  " 1 + ®2 
\ = ,"°l-°2 

(0,2,3) 

X-.=  -03 

X0  = W 02  t 203 

\ = 1 * °1  -e2-e3 

(0,  3,3) 

\n  =1-0, 

0 3 

\ = 1 + G + 29 

(0,  3,4) 

X-1  * °4 

X0  . 1 . - 384 

= 1 + 0.  + 20,  + 30, 

(0,  3,5) 

X-2  = -°5 

X-1  = °4  + ,e5 

Kq  ■ 1 - 0j  - *V  60, 

X >14  0 , 20  + 30,  + 10,. 

X2  . 1 . 0,  - 02  - 0,  - 0,  - 0, 

For  example  the  updating  formulae  for  the  (0,  3»  55  model  can  be 
found  directly  by  relating  (16)  to  the  forecasting  formula  from 
the  integrated  model. 

We  find  that  the  updating  formulae  derived  below  are 


b(t>  . b<t>n  + b(t-l>  t bU-l>  + x0«t 


blt}  " bit-1)  + + (X1  + 1?X2)at 


(175 


b^)  - ♦ HX2at 


Note  that  the  first  terms  in  the  right  of  ( 17)  simply  allow  for 
movement  of  the  origin  without  chanping  the  polynomial.  The  term 


involving  the  last  random  shock  at  appropriately  updates  the  coef- 


ficient. 

The  updating  formulae  (17)  are  derived  as  follows.  VJe  have 
from  Equation  < 1 14 ) that 


Zt+£  " X-27at+£-l  + X-lat+£-l  * X0Sat+A-l 


( 18) 


+ Xls2at+i-l  + X2s3at+£-l  + at+£ 


Assuming  l>2  and  taking  expectations  at  origin  t we  find 


StU)  - EfVWi1  + E<1is2am-i>  + e<x2s3(Wi> 


- < W + (xis  Vi  + lxiEat> 


* <x2s\-2  * <t«x2sJat.1  + iiiilix2sat) 


“ ^X0Sat  +X1S  at-l  + X2S  at-l  + X2s3ftt-2^ 


+ £( X^Sa^  + XgS  + ^XjSa^) 


+ 4£(l;X2Sat) 


( 1?) 


1 \ 


■i 
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The  coefficients  bQ^,  b^,  bj^  in  Equation  (16)  are  now  identified 
"o'0  • \lSH  + *xS**l-l  + X2S%-1  + X2®^*t-2 

bltJ  ■ XlSat  * X2s2*t-1  + **  X2S®t 

b£b)  ■ ***2Sat  (20 ) 

Now  it  is  seen  that  U7)  can  be  rewritten  as  (20)  . 

3.7  How  are  the  data  used  in  the  forecast? 

Still  another  way  to  Interpret  the  forecasts  is  as  a weighted 
sum  of  previous  observations:  Writing  (5)  as 


enyzt  “ ff(B)zt  “ Bt 


(21) 


as 


where 


ir(B)  - 1-ir  B -t-B*  - 
1 * 


(22) 


we  find  that 

2 ,. ( i ) - Tr1zt(l-l)  + w2zt(i-2)  + ...  (23) 

where  zt(-h)  is  taken  to  mean  zt_h  for  h - 0,  1,  2,...  . The 
n-weights  can  be  found  by  equating  coefficients  in  the  following 
identity  after  the  6-estimates  have  been  Bubstitutedi 

V3  - 6(B)  ir(B) 

<1  - 3B  ♦ 3B2  - B3)  - (1  - eiB-e2B2  - fljB3  - e^B*  -O^5). 

(1  - i^B  - irzB2  - v3B3  - ..  .)  (2H) 

The  vr-weights  (also  denoted  by  w^)  are  given  in  Figure  5i 
thus  for  example 
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zt(l)  • .922Hzt  + .20?xzt_1  + .355xzt>2  - ~039*zt_3 

-.068xzt_l4  - ,171xzt_^  - .lA9xzt_g  - .I43xzt_7 

-.107xzt_g  - *079xzt_9  - .O^xz^^  - .0l8xzt_11  (25) 

+ .000xzt_12  + .027xzt_13  + •Oi<lxzt_11(  + ,0^xzt-i5 
+ .048xzt>_1g  + .044xzt_1^  + *°36xzt,_1g  + .026xzt_20  + ... 

The  two  step  ahead  forecast  can  he  found  similarly  by  replacing  z. 

w 

by  z^(l)  and  z^_^  by  and  30  on  for  forecasts  with  higher 

lead  times.  However  these  forecasts  may  also  be  expressed  directly 

as  weighted  sums  of  the  observations  , z^^,  * The 

(2)  ( 3 ) 

weights  ir  ‘ ' and  it  corresponding  to  the  two  and  three  step  ahead 
forecasts  respectively,  are  also  shown  in  Figure  5.  In  the  remain- 
der of  this  paper  a brief  outline  iB  presented  of  two  other  impor- 
tant applications  of  time  series  modelling. 

Intervention  Analysis 

We  freouently  need  to  detect  & estimate  possible  changes  in 
the  functioning  of  a system  affected  by  known  interventions. 

For  example  Figure  6 shows  monthly  averages  for  ozone  in 
parts  per  hundred  millions  (p.p.h.m)  measured  in  downtown  Los  Angeles 
(Box  & Tiao  1975  [13]).  It  is  known  that  in  January  i960  a law 
(rule  63)  was  put  into  effect  whereby  the  amount  of  reactive  hydro- 
carbons in  gasoline  sold  throughout  L.A.  county  was  reduced.  Can 
a change  be  detected  at  this  point,  in  the  series?  If  so  how  large 
is  it? 


Furthermore  modified  engines  were  made  compulsory  for  new  cars 
introduced  after  1966.  Can  any  effect  be  detected  which  might 
plausibly  be  related  to  this  intervention? 

Standard  statistical  procedures  will  oertainly  be  invalidated 
for  examples  of  this  kind  because 

(a)  the  noise  is  highly  dependent  (&  in  this  case  seasonal). 

(b)  the  effect  of  changes  made  may  not  be  immediately  felt  but 
may  have  dynamic  characteristics. 

Difference  equation  models  of  the  form 


v . . 6(B). 

* t 67bT  itit  t 


can  take  account  of  both  difficulties. 

For  the  Ozone  data  a model  was  developed  of  the  form 

upx..  u,X-t  (l-e.B)(l-0.B12) 
yt a Vit + + if?* + — — at 


(26) 


In  this  expression  xlt,  x2t,  & are  indicator  variables  allowing 
for  possible  changes  introduced  by  interventions. 


Llt 


k2t 


-£ 

.f 


IP 


t < Jan  60  Allows  for  step  change  of  size  possibly 

t > Jan  60  associated  with  rule  63. 

for  summer  months  66  onwards  Produoeo  a staircase  func- 
tion (step  size  w^)  to  rep- 
resent possible  effect  of 
new  car  engines  in  Bummer 
for  winter  months  66  onwards  conditions. 


l3t 


/-0  for  summer  months  66  onwards 


for  winter  months  66  onwards 


Produces  a staircase  func- 
tion (step  size  w^)  r*P' 

resent  possible  effect  of 
new  car  engines  in  winter. 
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WEIGHTS 


>_t  hourly  reading  of  03  (pphai)  at  Downtown  Los  Angeles  (19SS-1972 
function  for  estimating  the  effect  of  intervening  events  in  1960) 


I 


I 


I 


I 


\ 


Estimates  were  obtained  as  follows 

fij  ■ -1.094*10,  fig  * -.25±*07  fij  ■ -.07±*06 

(with  0^  * -.244.03,  8g  ■ .554.06) 

This  suggest  that 

(1)  a step  change  of  about  -1.1  units  occurred  at  about  the 
time  rule  63  was  Introduced. 

(11)  that  progressive  changes  of  about  -0.25  units  per  year 
occurred  In  the  summer  months  after  the  new  engines  were 
introduced. 

(ill)  no  detectable  corresponding  effect  occurred  in  the  winter. 
Sieasonal  Adjustment 

It  frequently  happens  that  time  series  such  as  inventories  of 
equipment  Items,  army  recruitment  eto.  are  highly  seasonal.  Changes 
are  much  more  readily  understood  if  appropriate  seasonal  adjustments 
are  made.  An  empirical  method  for  separating  seasonal  series  Into 
(1)  a seasonal  component  (11)  a trend  component  (ill)  an  additional 
error  component  have  been  discussed  by  Julius  Shiskin  (1967), 

C IB 3 1 Is  presently  used  extensively,  and  is  referred  to  as  the  Xll 
method.  This  method  produces  good  results  on  the  average.  It  Is 
however  unable  to  take  account  of  the  particualr  properties  of  in- 
dividual series.  New  research  suggests  that  a model-based  approach 
(Box,  Klllmer  & Tlao  1976),  C 2 ] can  accomplish  this.  For  example 
Figure  7 shows  results  obtained  by  the  Xll  method  & by  the  model 
based  method  on  a time  series  for  unemployed  males  in  the  United 
States  20  & over. 

Further  research  on  stochastic  difference  equation  models  is 

currently  undergoing  vigorous  development.  In  particular,  research 
la  being  conducted  into  multivariate  applications  and  to  problems 

In  control  & the  general  identification  of  dynamic  systems. 


1 


j 
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Plgure  7»  Separation  of  time  aeries  into  components  using  Xll  and 
Model-Based  Procedure . 506 
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